CONTACT US Wed Nov. 13, 2013

CASS 中国社会科学网(中文) Français


Linguistic data contributes to production in the information age

Author  :  LI YUMING     Source  :    Chinese Social Sciences Today     2020-08-04

The factors of production are intimately pertinent to an economic system and the development of its level of productivity. Data can serve as a factor of production, but this can be achieved and recognized only when informatization has developed to a certain stage.

As production factor

On Dec. 8, 2017, Xi Jinping, general secretary of the Communist Party of China (CPC) Central Committee, proposed “to build a digital economy with data as a key factor.” Multiple conferences highlighted this topic, such as the first Digital China Summit (Fuzhou) in April 2018, the China International Big Data Industry Expo in May 2018, the Jiangsu Internet Conference in September 2018 and the 6th China International Big Data Conference in December 2019. At the 2018 Jiangsu Internet Conference, Wang Xinzhe, chief economist of the Ministry of Industry and Information Technology, emphasized that “the digital economy with data as a key production factor, following the agricultural economy and the industrial economy, is fostering a new economic form.”

The fourth plenary session of the 19th Central Committee of CPC proposed to improve the mechanism in which the market evaluates the contributions of production factors such as labor, capital, land, knowledge, technology, management and data, and their rewards are based on contributions.

The proposal summarizes the idea that data has the nature of a production factor. It compared data with other factors and highlighted their ability to earn contribution-based rewards through the market. This is a major theoretical innovation, reflecting an essential understanding of the information society and the evolution of the economic system against the backdrop of the rapidly developing digital economy.

Generally, data is the manifestation and carrier of information. The advancement of technology and society may bring changes to the connotation and extension of data. But one certain thing is that most data is linguistic data. Language (including written words) is the most important carrier of human information, and about 80% of information is embedded in language. Information not loaded with language often requires language to help explain it, such as painting, sculpture, music, clothing, architecture and other arts. Linguistic resources are also linguistic data. As the most important form of data, linguistic data should fall into the category of production factors.

Linguistic data is a production factor in the information era. As land is to farmers and machines are to workers, language data is to computers. Computers can acquire knowledge and intelligence through the input and processing of language data, thus harkening a new future for humanity. The nature of linguistic data will become more evident as computational linguistics develops.

Enter digital economy

In the 1950s, people began to explore machine translation, and the history of training machines to process language information began. After solving the difficult problems of character processing and word processing, Chinese information processing has successfully entered the speech processing stage, in which people strive to equip computers with linguistic intelligence. The rapid advancement in the fields of information retrieval, automatic translation, machine writing and human-machine dialogue are beneficiaries of linguistic big data aggregation and application.

Linguistics classically understands language as a symbolic system unique to human beings. Thanks to the development of computational linguistics, two species, humans and machines, can share language. In many cases, important linguistic communications already follow the interaction pattern of “human-machine-machine-human” used in webinars and online classes as well as online shopping and online medical treatment.

Interfacing with humanoid robots can magnify the sense that a machine commands a language. As the Internet of Things develops, we will be able to implant a “language sensor” in any target that needs to be controlled. People will be able to connect and communicate with everything through linguistically intelligent machines.

The Central Economic Work Conference held in December 2018 redefined infrastructure construction, referring to 5G, artificial intelligence, the industrial internet and the Internet of Things as new infrastructure construction. Over the past year, this new infrastructure has been continuously enriched and its horizons have gradually become clear. New infrastructure is more than the construction of information network infrastructure; it also focuses on intelligentization, especially the introduction of linguistic intelligence enabling the dialogue between people and all things.

Many language industries are related to information. In this era with a higher degree of industrialization, according to a 2013 research by a Swiss language industry economist, the contribution of language industries to social GDP in Switzerland has neared 10%. In the information age, when data can become a factor of production, the economic energy of language industries will be remarkably increased. We might predict that a prosperous digital economy won’t be possible without the prosperity of language industries.

In the future, perhaps very soon, language data will become an important factor of production, and language will turn into an important category of productivity. A major task for promoting production will be the collection and management of language data and the maximization of its potential. Language industries and language-related professions will serve as an important pillar of the digital economy.

Ternary space

Before mankind came into being, the world was pure nature, just a physical space. The formation and development of mankind have created a social space within the physical space. Language and social space have grown together. About 30,000 to 50,000 years ago, in the Paleolithic period, humans already used a relatively mature spoken language, and the carrier of spoken language was sound waves. Roughly 5,000 to 5,500 years ago, written words were created in Mesopotamia, and language had a new carrier, light waves. In the 1920s, radio and television were invented, and audio media bestowed language with its third carrier, electric waves.

By the end of the 20th century, with the commercialization of the internet and rapid development of language information processing, people have begun to construct a new type of space, information space, also known as virtual space and cyberspace. In his 2019 report entitled “Artificial Intelligence 2.0 and the Digital Economy,” Academician Pan Yunhe keenly pointed out that human beings are gradually shifting from a binary spatial system, consisting of physical space and human society, to a ternary spatial system, formed by physical space, human society and information space.

Information space keeps developing, and its structure and operating mechanisms are still being incrementally recognized and improved. But one thing is relatively clear, that is, information space is mainly digitized language space. Language was used in social space, and now it has been adopted by another space as well: information space. With the development of the Internet of Things and computational linguistics and the implementation of new intelligent infrastructure, language will enter the physical space and be used in people’s ternary space. The role of language in human production activities will intensify.

Language is no longer a mere humanistic phenomenon. It has three major media, namely sound, light and electricity. It is shared by humans and machines. It will be applied in the ternary spatial system of social, information and physical spaces. Linguistics, as the science of “studying language and related issues,” cannot be restrained to “language and literature,” but should act as a comprehensive discipline that spans the liberal arts and sciences.

In October 2017, Hiram College in the US put forward the educational concept of “new liberal arts.” It reorganized its 29 majors and incorporated new technologies into philosophy, literature, language and other courses. This move mirrors the era’s general trend of interdisciplinary integration. China is also promoting the construction of new engineering disciplines, new medical disciplines, new agricultural disciplines, and new liberal arts. According to the nature of language, linguistics should develop based on the idea of the new liberal arts. Linguistics that integrates and incorporates new technologies can adapt to the proposition and economic mechanism of the era in which data is the key production factor of the digital economy, facilitate the development of the knowledge economy, and advance the construction of new intelligent infrastructure. Furthermore, planners of new infrastructure and the knowledge economy should also focus on language and linguistics to obtain a scientific dividend.


Li Yuming is director and chief expert of Beijing Advanced Innovation Center for Language Resources at Beijing Language and Culture University. This article was translated from Guangming Daily.

Editor: Yu Hui

>> View All

Hong Joungsun: Ambassador of China-ROK literature

Hong Joungsun is a representative South Korean scholar of modern literature. He developed deep feelings for Chinese c...

>> View All