Developing high-quality structured humanities knowledge bases
FILE PHOTO: The China Biographical Database (CBDB) hosted by Harvard University
Digital and intelligent technologies have introduced new methodologies to traditional humanities disciplines, giving rise to new interdisciplinary fields represented by digital humanities. Breakthroughs have been made in the digitalization, quantification, and visualization of humanities materials.
In the past, the humanities often struggled with a misunderstanding of knowledge structuring, assuming that digitalization was complete once paper materials were scanned into electronic documents or when artifacts were scanned and modeled in 3D. However, scanning merely constitutes data collection. Knowledge structuring involves not only collecting data but also converting data into conceptual nodes and establishing clear, effective relationships between these nodes.
Concepts include people, artifacts, events, time, and places. Relationships encompass interpersonal relationships, object attributes, and people-place relationships. Conceptual nodes and relationships form different structures such as linear structures, hierarchical structures, and networked structures. Efficient retrieval, statistical analysis, and reasoning can only be achieved through highly structured humanities knowledge bases.
Knowledge structuring
The China Biographical Database (CBDB), hosted by Harvard University, is a good example of humanities knowledge structuring. Covering over 530,000 biographies from Chinese history, this large-scale knowledge base provides information about more than 640,000 individuals, such as dates of birth and death, kinship, social relations, postings to office, places in people’s lives. Traditionally, experts would spend tremendous effort on organizing materials as well as wording and phrasing in order to compose a brief biography for each historical figure, which is suited to qualitative research.
By contrast, CBDB is a structured database that establishes relationships among various attributes of individuals by means of triplets such as , <3rd year of Hongwu, Conversion to AD, 1370>, and
Reasoning
While the concept of “family” is not present in the original CBDB data, it contains over 400 kinship terms, such as “father,” “eldest son,” and “youngest son,” which are not directly usable for determining families. However, kinship triplets such as
Challenges
At present, defining concepts and relationships is the greatest challenge in structuring humanities knowledge. For instance, as the official system continually evolved and varied across Chinese dynasties, appropriate conceptual systems and relationship triplets need to be created to represent the connections between different official positions. “Events” is currently the most difficult concept to process, because a major event may consist of several smaller events, which can be further broken down into even smaller ones, and elements such as involved parties and time also vary from event to event.
These issues should be addressed through collaboration between humanities scholars and computer scientists. On one hand, computer scientists often lack the profound humanities knowledge necessary for making qualitative judgments. On the other hand, high-quality multilingual data on ancient texts and ancient knowledge bases remain scarce.
In the future, the development of structured humanities knowledge bases can be enhanced by building knowledge bases each covering a specific historical period and knowledge area, which can be further integrated into international, comprehensive, multilingual humanities knowledge platforms. This may lead to more humanities research that combines macroscopic and microscopic perspectives, as well as qualitative and quantitative approaches. Moreover, high-quality structured humanities knowledge bases have a wide range of applications in fields such as humanities education, intercultural communication, and science communication.
Li Bin is a professor from the School of Chinese Language and Literature at Nanjing Normal University.
Editor:Yu Hui
Copyright©2023 CSSN All Rights Reserved