CSSN-Computational linguistics boosts natural language processing

Computational linguistics boosts natural language processing

Author : ZENG JIANG Source : Chinese Social Sciences Today 2020-10-08

Machine translation has been put into commercial use, such as the real-time automatic speech recognition used at conferences. Photo: FILE

Computational linguistics is a new interdisciplinary field that uses computers to study and process natural language— a language that has evolved naturally as a means of communication among people, as opposed to artificial language and formal language. As China has worked to build its new liberal arts field in recent years, computational linguistics and its application have attracted increasing attention. As an emerging discipline, it has wide applications and broad prospects.

Computational linguistics involves linguistics, mathematics and computer science, three fields in the liberal arts, science and engineering. Its strong interdisciplinary nature proposes new requirements for scholars. In an era of information networks, many linguists are working hard to learn computer technology in respect to natural language processing, thus becoming a new generation of linguists mastering the knowledge of linguistics, mathematics and computer science, said Feng Zhiwei, a research fellow from the Institute of Applied Linguistics at the Ministry of Education.

Peking University is a hub of computational linguistics research. Wang Houfeng, director of the Institute of Computational Linguistics at Peking University, said that over the past two decades, data-based statistical methods and machine learning have played a dominant role in natural language processing, which means natural language processing has been mainly data-driven. The development of deep learning in recent years has strengthened the role of data, especially unlabeled language data. Deep learning has accelerated the emergence of such technologies as pretrained language models.

Computational linguistics has found a wide range of applications, Feng said. For example, current international research on computational linguistics has made great progress in machine translation. Machine translation has developed from rule-based machine translation and statistical machine translation to neural machine translation. It is now being applied and commercialized, transitioning from scholars’ dreams to a reality.

In recent years, artificial intelligence, the digital humanities, big data and other related fields have experienced accelerated development. At the same time, new requirements have been proposed in the course of building new liberal arts. In this context, many academic institutions in China have established their own centers and platforms for researching computational linguistics and natural language processing. For instance, the Institute of Language Intelligence was established at Beijing Language and Culture University in June 2019, and the Research Center for Natural Language Processing, Computational Humanities and Social Sciences was founded by the Institute for Artificial Intelligence at Tsinghua University in July 2019.

There are currently three major focuses of research in computational linguistics, Wang noted. Tracking and improvement is a top priority. For example, we need to integrate multi-modal information such as structural knowledge into pre-training models. Furthermore, the improvement of model efficiency is also vital. Due to a very high complexity of training under the current deep learning framework, the problem of how to reduce complexity needs to be solved. In addition, we should strengthen language knowledge mining centered on the Chinese language.

According to Song Rou, a professor from the School of Information Science at Beijing Language and Culture University, there is still a disconnection between linguistics and language engineering. The in-depth integration of the two is necessary for establishing and perfecting language knowledge systems.

Liu Shi, a professor from the Department of Chinese Language and Literature at Tsinghua University, and Sun Maosong, a professor from the Department of Computer Science and Technology at Tsinghua University, proposed building a Chinese Classics Knowledge Base. Liu is currently undertaking the Analysis of Classical Texts of Ancient Chinese Literature Based on Big Data Technology, a key project of the National Social Science Fund of China. The project is aimed at realizing automatic word extraction, word segmentation and correlation analysis of classical poetic texts through computational linguistics and natural language processing technology, and establishing a platform of the Classical Poetry Knowledge Mapping.

Academic circles in China have been engaged in tracking studies, lacking innovative achievements. In particular, there is a lack of computational research that targets the Chinese language. In the future, we need to strengthen computational research of Chinese according to its own characteristics, Wang suggested.

At present, deep learning has become a mainstream method in almost all fields of computational linguistics. However, it is an empirical method based on big data language, ignoring language rules. Deep learning should be combined with linguistic research. Only through combining empirical methods based on big data language and rationalist approaches based on language rules can computational linguistics further its development, Feng concluded.

Editor: Yu Hui