How can history wield double-edged sword of digital technology- CHINESE SOCIAL SCIENCES NET

How can history wield double-edged sword of digital technology

Author:Chen Mirong Source:Chinese Social Sciences Today 2026-05-02

Tim Hitchcock Photo: PROVIDED TO CSST

A surging wave of big data and AI is crashing against the gates of historical studies, propelling this ancient discipline toward a new crossroads. This technological revolution signifies not merely an upgrading of research tools, but a potential reshaping of historical methodology and a transformation of the very paradigms through which scholars understand the past. As the cutting edge of technology turns toward history’s depths, questions concerning research methods, data bias, academic ethics, and the nature of historical inquiry itself have become increasingly urgent. To explore these issues, CSST recently interviewed Tim Hitchcock, professor emeritus of digital history at the University of Sussex in the United Kingdom.

Distant and close reading

CSST: Are big data and AI driving historical research away from traditional “close reading” toward “distant reading” and macro-analysis? In your view, could these technologies potentially “dehumanize” the field, causing scholars to lose sight of the nuanced contexts and individual experiences within history?

Hitchcock: History as a discipline is headed in both directions–towards more distant reading and more—better contextualized—close reading. “Distant reading” is revitalizing forms of social science history that depend on charting large scale change. I am very hopeful that this trend will help ensure the growth in historical studies that seek to explain as well as describe historical processes. Arguably, since the rise of post-modernism in the 1980s, historians have been increasingly timorous about proposing broad models of historical development—to the detriment of the field as a whole. The ability to aggregate massive bodies of historical material and to seek out patterns within that resource via big data and AI promises something quite new. This does not, however, imply a “dehumanization” of history writing. At its best, good social science history uses individual experiences to illustrate broader change, while all forms of history writing demand examples and empathy.

Nonetheless, big data and AI also create possibilities for close reading. The underlying vector mathematics behind much of AI relies on locating the meaning and import of a single word, in the context of all its many bed-fellows. Undertaking the close reading of a line of text, or a short document, in light of every word published that year—or by that author—or in that genre, promises to re-invigorate and transform close reading as an analytical practice. I look forward to the moment when every word I read, in every document, is accompanied by detailed information about its changing meaning and context. Is the writer using a neologism, or vocabulary popular in their youth? How is that word’s use distributed across genres, and between authors? Many of the practices associated with close reading can be made more powerful and more illuminating through AI.

CSST: Might conclusions drawn from AI-driven analysis conflict with those derived from traditional archival-based historical studies? If so, could this give rise to new historical theories?

Hitchcock: AI-driven analysis is not fundamentally different from other forms. It is perhaps a subsection likely to privilege measurable change in large scale datasets, or changes in language use. But historians and corpus linguists have been doing this for generations. There is the possibility that AI-driven analysis will allow different source types to be integrated in novel ways that traditional historians have not yet explored.

In an abstract world of data, I can imagine an AI pulling together every dataset that includes a specific year into a single pool of information—including geological and weather records, ships’ logs, and tree rings. This might produce something entirely novel, but I doubt it. Underlying the question is perhaps an assumption that unsupervised AI might generate a new “explanation” or model of social change, and that we would be forced to accept it, just because there are newly discovered correlations between different data-types. But I don’t believe this is how history works —when you strip out the hyperbole, history is just an evidenced argument with the present. AI can provide compelling “evidence’” of some “fact,” but it will still need to be argued.

Advancing history with LLMs

CSST: In your opinion, are historical archives themselves filled with biases? If AI models are trained on incomplete and biased datasets, will historical structural biases be repeated or even amplified? What strategies can be employed to identify and correct these biases?

Hitchcock: The very essence of academic history writing is founded in a Western-centric notion of the “archive” that excludes as much as it reveals. And the development of the internet has re-enforced all those biases in multiple ways. To take a single example, most early digital history sites were built on the basis of microfilm collections created between the 1930s to the 1980s in Europe and North America. These collections were created to give authority to the most conservative of historical materials, and this conservatism (indeed racist bigotry) has been simply reproduced online. As a result, the modern “object of study” for most historians—texts and data that can be found online—already pushes us in specific retrograde directions. AI takes the biases of the “archive” and gives it new legs.

At the same time, there is room for optimism. Much of the response to this issue has taken the form of the analysis of the “silences” in the archive—frequently replacing detailed evidence with imaginative engagement. But there is another approach available. The rise of mass digitization allows us to model the archive to expose what is there, and what is not. The life of an average 19th-century working-class man from Manchester can be traced through 20 different records. The equivalent number for a person in much of the Global South would be a fraction of a single record. We could create a statistical model of the relationship between the historical population of the world, and the representation of that population in the archives, to allow us to measure of the relative importance of each document—creating a measure of how much each should contribute to a new historical understanding. If just a handful of documents stand in for the lives of hundreds of thousands of people, they deserve to be thought of differently.

As important as this would be for historical understanding, an AI that builds in this relationship would help turn it from a system that largely reproduces the biases of inherited texts into something that genuinely advances our understanding. As much as it would assist historical understanding, it would also form part of the process of reforming AI more generally. As it stands, just because the internet is Western- and English-centric, most AIs reproduce a Western- and English-centric understanding of the world. Incorporating a precise relationship between text (both online and inherited) and people, would help transform large language models (LLMs) into something much more useful, and less prone to the reproduction of banal platitudes.

CSST: How can historians differentiate between genuine historical insight and statistical coincidence or fallacy? If a deep learning model reveals an unexpected pattern or correlation, how should historians “explain” the AI’s discovery?

Hitchcock: To explain evidence and patterns, however discovered, is simply the job of good history writing. The essence of the historical discipline is sifting through the alternative correlations to create an argument—an explanation—that is useful in the present. Whether the correlation was identified via an AI, a spreadsheet, or a census return is irrelevant.

Much more problematic is the possibility of automating the writing process—of eliminating the historian (however AI-assisted) from the equation. Unlike our treatment of evidence and correlation, this transformation in “writing as thinking” fundamentally undermines the historical process. It is only when we turn data and correlation into evidenced argument that we create history. Outsourcing this process would both fundamentally undermine the intellectual project and invalidate history as an academic training.

Multilingual pipelines for accessing historical materials

CSST: Generative AI can create highly realistic historical texts and images, and even simulate conversations with historical figures. How do you view the use of this technology in historical research? How should the field address the risk of “deepfaking history” that it invites?

Hitchcock: For the most part AI-generated historical texts, images, etc., feel more relevant to public history and the presentation of the past to a broad interested community than it does to historical research as a professional and academic discipline. I am put in mind of the “Cast Courts” or reproductions gallery at the Victoria and Albert Museum in London. Opened in 1873, it was filled with detailed reproductions of classical and Renaissance architecture and sculpture that would supposedly allow every museum visitor to experience at least a faint reflection of the original. They were deep fakes.

More recently, we have witnessed endless attempts to create historical evidence via dioramas, animatronics, 3D modelling of archaeological sites, and historical re-enactments. As each form of presentation gets better, it demands greater critical attention, and perhaps new tools of analysis. The development of “deep fakes” also implies the need to re-formulate the undergraduate curriculum to include critical engagement with digitally produced materials.

CSST: Looking ahead, what is the most exciting prospect for big data and AI in historical research? And simultaneously, what is your most significant warning or caveat for the field?

Hitchcock: The innovation I am most excited about is the development of pipeline systems for accessing historical materials in different languages and scripts. To take a single example, there are millions of pages of early modern Ottoman legal records preserved in Türkiye that have not been used by historians because they don’t have the language and palaeography skills. We are rapidly coming to the point where creating a digital image of these sources would allow Handwritten Text Recognition to generate a transcription in the original Arabic, allowing for an automated translation into another language. This translation could then be used to extract relevant text from the formulaic. The abstracted content could then be marked up for statistical analysis via automated semantic and geographical tagging, leading in turn to a resource that would open all these records to historians working anywhere, in any language.

My anxieties focus more on the historical profession than on the technology. At its best, the historical profession takes a self-conscious approach to evidence and argument—endlessly re-thinking how we know things about the past, and how we use the past in the present. But for the most part, in recent decades the profession has not risen to the challenge of interrogating the changing character of evidence when it is put on line. Our endless footnotes cite hard copy archives and physical books, when for the most part we read books online, and visit archives via the web. There is a dystopia out there in which AI generated “history” is allowed to take the place of the debates and discussions that animate the discipline and justify its role in a wider culture. As a profession, we are well placed to prevent this from happening, but it requires a much clearer sense of how history works, and our role in its creation.

Editor：Yu Hui

close print