Human-like understanding not essential to machine translation
On July 28, 2025, at the World Artificial Intelligence Conference (WAIC) in Shanghai, an AI-powered translation software and hardware developed by a Chinese company attracted numerous foreign visitors to experience the technology. Photo: IC PHOTO
From antiquity to the present, translation has served as a vital channel for civilizational exchange. It demands mastery of two languages as well as reading comprehension and writing skills, making it one of the most demanding intellectual activities undertaken by humankind. Out of practical necessity, the pursuit of faster and more cost-effective translation has driven rapid advances in machine translation (MT). In particular, neural machine translation (NMT), based on artificial neural networks, has in recent years propelled MT to unprecedented heights. The MT industry frequently claims that NMT is approaching the quality of human translation. Yet numerous studies comparing human and machine translation from various perspectives have identified fundamental shortcomings in MT—shortcomings that, in many cases, only human translators can overcome. Some argue that despite its impressive progress, MT still cannot rival human translation because it lacks genuine understanding, and that further breakthroughs in MT will likewise depend on advances in machine understanding.
We do not share this view. On the contrary, we believe that the latest advances in MT are progressing along the right trajectory, and that such criticisms can largely be disregarded. Indeed, we worry that promoting “understanding” in the human sense as the next goal for MT may be misleading and ultimately infeasible. Here, “understanding” refers to semantic comprehension of the kind possessed by humans.
Nature of understanding
At present, our knowledge of what “understanding” entails remains vague. “What is understanding?” has become a subject of intense philosophical debate in recent years. Broadly, two main perspectives exist: The epistemic and the ability-based. Scholars of the philosophy of science regard understanding as a more advanced form of knowledge, whereas epistemologists tend to view it not merely as possessing more knowledge but as the manifestation of a capacity.
Although the nature of understanding remains unsettled, it is clear that it represents a highly complex intellectual activity. To endow machines with human-like understanding would be extraordinarily difficult. Indeed, possessing such a capacity is itself part of the ultimate goal of artificial intelligence research; to treat this goal instead as a technical means is misguided. This raises the central question of this paper: Is human-like understanding truly indispensable for MT? By way of analogy, before the invention of airplanes, one might have wondered whether a flying machine necessarily needed to flap its wings like a bird.
Types of ambiguity
The greatest challenge in translation lies in ambiguity. Theories abound on the sources of ambiguity in natural language and their impact on translation. Here, we focus on several critical types, two of which in particular pose the most serious obstacles to translation: Lexical ambiguity and grammatical ambiguity. Lexical ambiguity pervades every natural language. As Ashish Vaswani et al. have emphasized, every word in every language is polysemous—ranging from literal to figurative meanings, as well as senses clarified only through context—and machines must identify the one relevant meaning. For instance, in English, “bank” can mean a financial institution or the side of a river, while in Chinese no single word encompasses both meanings. In translation, a choice must be made, and context determines which meaning applies.
A well-known example of ambiguity is the sentence: “The toy box was in the pen.” Most commonly, “pen” refers to a writing instrument, but it may also mean an animal enclosure, and more rarely a small storage area. Since a toy box could never fit inside a writing instrument, the sentence cannot reasonably be interpreted as “the toy box was in the fountain/ballpoint pen.” Therefore, the correct interpretation must be “the toy box was in the small storage space.” This example is notable because it starkly highlights the ambiguity problem: The sentence provides no contextual clues, and the correct interpretation relies on an exceedingly rare sense of the word.
Grammatical ambiguity arises at the sentence level. One of the purposes of grammatical rules is to reduce such ambiguity, yet in actual usage, many languages employ grammar loosely, with certain elements omitted, leading to multiple possible interpretations. A classic case is “I saw the man on the street with a telescope.” This can mean either “I used a telescope to see the man on the street” or “I saw the man on the street who was holding a telescope.” Lexical and grammatical ambiguity may also occur simultaneously, compounding the difficulty of disambiguation.
It is evident that resolving ambiguity is a formidable task even for humans, and vastly more so for machines. From the perspective of MT, ambiguity introduces a combinatorial explosion and computational complexity, producing a formidable bottleneck. Parsing—decomposing an ambiguous sentence into multiple possible structures and analyzing the relations among them—has long been a principal approach in rule-based MT. To resolve ambiguity, rule-based systems must record all possible senses of a word in varying contexts. For instance, to translate “I went to the bank to get some cash” into Chinese correctly as “我去银行 (financial institution) 取一些现金” rather than “我去河堤 (river bank) 取一些现金,” the system must encode rules such as “when bank co-occurs with cash, translate as ‘银行.’” However, ambiguity is so widespread that some common words may have over 20 senses, while flexible grammatical rules cause the number of possible sentence structures to grow exponentially with sentence length. This combinatorial complexity makes it practically impossible to build a dictionary- and rule-based MT system capable of resolving every ambiguity within a sentence. The fundamental reason such systems have failed is simply that they are constrained by computational complexity.
Avoiding human-like understanding
What, then, gives rise to ambiguity itself? The answer is straightforward: The very act of attempting to interpret meaning. In other words, ambiguity emerges precisely because understanding requires interpretation, and interpretation inevitably generates divergent possibilities. Yet once ambiguity arises, it inevitably encounters the bottleneck of computational complexity.
A simple line of reasoning follows: Might it be possible to bypass the need for understanding, thereby avoiding ambiguity altogether? Indeed, this is exactly the path technology has taken—moving from rule-based MT to corpus-based methods such as statistical machine translation (SMT) and NMT. The difficulty of disambiguation was one of the chief motivations behind the adoption of corpus-based approaches. These methods do not seek to resolve meaning semantically but instead model correspondences probabilistically through contextual co-occurrence. In doing so, they opened a broad path free from the constraints of combinatorial complexity—achieving striking results. Though corpus-based methods may appear to resolve ambiguity, their true aim lies not in semantic determination but in probability estimation. By resisting the temptation to interpret meaning, they escape the computational trap of ambiguity.
In this sense, by avoiding human-like semantic understanding, MT circumvents computational intractability and produces effective results. This is the inner logic of its technological evolution—a logic that also explains why MT has deliberately moved away from human-style understanding. The outcome of this avoidance may even be regarded as a new form of “machine understanding:” MT is not rooted in semantic comprehension but in non-semantic features of language—such as statistical correlations among words and phrases within sentences, and correspondences between them across languages. In other words, machines can acquire a mode of “understanding” distinct from humans,’ yet sufficient to enable successful translation.
Semantic understanding, while highly important in human translation, is not a necessary condition for MT itself. The progress of MT to date has owed little to semantic comprehension, but much to non-semantic statistical correlations: Dependencies among words and phrases within sentences and correspondences between them across languages. These correlations arise from the combinatorial properties of language and of bilingual mappings. As long as both the training corpora and the sentences to be translated exhibit such combinatorial structures, machines can learn from the data and produce high-quality translations. In this way, tasks that might appear to require semantic understanding can, in fact, be accomplished by machines through an alternative form of “understanding.”
It follows, then, that intelligent behavior requiring human understanding does not necessarily demand an analogous process in machines. The supposed dependency on or necessity of understanding may, in many cases, be a misconception or a distraction. In engineering practice, no intelligent behavior has yet been achieved by genuinely invoking understanding. In other words, behaviors that seem to demand understanding can be realized without it. Understanding is therefore a useful aid, but not an indispensable foundation. While this perspective does not itself advance our knowledge of what understanding truly is, it encourages reflection on its relative importance. The same holds in translation: Understanding is valuable, but not essential.
In conclusion, research on understanding—achieving genuine insight into its nature—is of great significance. Yet until such a breakthrough occurs, machines should pursue their own form of understanding.
Shen Hongliang, Wang Xin, and Luo Hui are research fellows at Princeton Institute of Intelligence Research and Application.
Editor:Yu Hui
Copyright©2023 CSSN All Rights Reserved