Decoding dilemmas behind AI detection controversies
FILE PHOTO: Jianziyuan is a professional detection system designed to assess the risk of academic misconduct, including AI-generated text and text similarity.
Renowned essayist Zhu Ziqing’s classic essay “Moonlight over the Lotus Pond” was recently flagged by an AI-detection system as having a 62.88% likelihood of being AI-generated. Even more absurdly, Tang Dynasty poet Wang Bo’s “Preface to the Pavilion of Prince Teng” was deemed nearly 100% AI-produced. These laughable results, which have been circulating widely online, have sparked growing public concern over the reliability of AI detection tools. Some worry their own papers could be misclassified, and fear that revising work to pass AI screening might actually undermine the quality of their writing. Others joke that one must “write poorly on purpose” to avoid triggering false positives. Many commentators argue that current AI detection technologies remain underdeveloped and are ill-suited for use as strict benchmarks in academic assessment. In light of these concerns, CSST interviewed several experts from both academia and industry, who examined the technical foundations and algorithmic limitations of AI detection tools and discussed potential ways to improve these systems.
Detection dilemmas
Dong Chenyu, an associate professor from the School of Journalism and Communication at Renmin University of China, recounted an experience where he submitted a newly completed paper on the livestreaming industry to a certain academic detection platform—only to be met with a result that was both frustrating and ironic. The paragraphs flagged as “highly suspected of being AI-generated” were, in fact, based on three years of fieldwork and multiple case studies by his team. Reflecting on this, he remarked that this incident underscores the immaturity of current AI detection technologies, which are plagued by both false positives (misjudging human writing as AI-generated) and false negatives (failing to identify AI-generated text). The underlying clash between technological logic and academic norms exacerbates such misjudgments.
Shi Shiping, editor-in-chief of Tianjin Social Sciences, explained that normative language and rigorous logic pursued in academic writing closely align with the foundational logic of AI-generated text, which is trained to emulate standardized expression. This creates a paradox for detection systems: the more fluent and logical a text is, the more likely it is to trigger an “AI-generated” alert, turning the hallmarks of quality writing into “evidence” for erroneous accusations.
Zhan Bingqiang, founder of AIGCLINK, argues that from both a technical and practical perspective, current AI detection remains immature and may even be considered a pseudo-proposition. Since AI models learn from human knowledge systems using techniques such as supervised fine-tuning, the surface features of AI-generated language—its linguistic structure and logical patterns—are becoming increasingly indistinguishable from human writing. As a result, the boundary between AI and human authorship is growing ever more blurred.
Moreover, current detection models suffer from methodological limitations, Zhan continued. First, most of them rely on single indicators to build detection benchmarks that are ill-suited for complex and dynamic textual scenarios. Second, to avoid missing AI content, some models are calibrated with overly sensitive thresholds, increasing the risk of false positives. Third, the lack of standardized evaluation criteria across different tools often yields dramatically different outcomes for the same text, demonstrating the absence of the universal applicability of technology.
AI detection is far more challenging than traditional “plagiarism checking,” said Chen Yang, a professor from the College of Computer Science and Artificial Intelligence at Fudan University. Large language models (LLMs) use human-created corpora as input data during both the pre-training and fine-tuning phases, thereby learning and modeling the patterns of human writing. Under such circumstances, LLM-generated content often resembles, or partially overlaps, with human-authored texts. Consequently, whether the text in question is a literary classic or an original work by a contemporary writer, misclassification remains a real risk.
Challenging academic ecosystems
To verify the efficacy of AI detection, CSST used an AI detection software called “Jianziyuan” to analyze classic literary works such as “Moonlight over the Lotus Pond,” “Preface to the Pavilion of Prince Teng,” and “Diary of a Madman.” The results showed an AIGC probability of 0.0% across all texts, in stark contrast to figures circulating online. This discrepancy reflects deeper issues in current AI-detection technologies.
Zhan analyzed the inconsistency from a technical perspective, noting that the significant discrepancies in detection results across different AI detection tools for the same text stem from the heterogeneity of their detection standards and technical paths. The algorithms, training data, and evaluation metrics used by developers vary widely, leading to fundamentally different technical paradigms and decision-making logic across tools. This results in highly context-dependent detection outcomes. Under specific academic contexts or for certain text types, a particular detection standard may demonstrate relatively high accuracy. However, once the subject domain, text genre, or linguistic style of the analyzed content changes, the validity of a certain standard diminishes considerably. Consequently, current AI detection technology is constrained by the multiplicity of technical standards and insufficient scenario adaptability. To improve reliability and consistency, Zhan argues it is urgent to develop unified industry standards and multi-scenario validation mechanisms.
The application of AI detection in academia not only impacts the innovative transformation of academic output but also undermines scholarly trust systems, said Li Qian, a research fellow from the Institute for Chinese Legal Modernization Studies at Nanjing Normal University. In terms of innovative transformation, to satisfy AI-detection standards, some scholars resort to simplifying their language, fragmenting expressions, and deliberately avoiding focused discussions of academic viewpoints. Others go even further, introducing irrelevant symbols or altering sentence structures to “evade” detection. These tactics often backfire, degrading both the quality and clarity of scholarly writing. Regarding academic trust, frequent misjudgments by AI detection systems risk eroding scholars’ confidence in AI tools, thereby disrupting the healthy development of academic ecosystems.
Comprehensive review mechanism needed
Scholars interviewed generally fall into two camps in their views regarding AI detection. One group calls for improving detection accuracy through technological innovation and establishing multi-layered review mechanisms to enhance screening capabilities. The other questions the utility of AI detection altogether, arguing for a shift away from the narrow focus on identifying AI-generated traces toward the broader restructuring of academic evaluation systems and the development of human-AI collaboration frameworks.
Shen Xibin, director of the new media department at the Chinese Medical Association Publishing House, acknowledged that while AI detection tools are helpful in identifying obvious AI-generated documents and curbing academic misconduct, their technical limitations are also significant. As detection capabilities improve, these tools have expanded their screening scope to such an extent that even routine text polishing is frequently misidentified as AI-generated content. This leads to unnecessary consumption of editorial resources and reduced publishing efficiency. Moreover, when processing vast volumes of literature, high error rates—caused by algorithmic biases and limited data samples— reveals flaws in the existing technology’s theoretical framework, algorithm design, and contextual adaptability. By strengthening technological upgrades and refining detection standards, Shen believes the reliability of AI detection tools can be enhanced, allowing them to play a more substantial role in academic review processes.
Liu Fangxi, a research fellow from the Institute of Literature at the Chinese Academy of Social Sciences, questioned the viability of traditional detection models from the perspective of technological development trends. He pointed out that with the accelerating pace of technological iteration, neither expert judgment nor software detection will be able to accurately distinguish between AI-generated and human-original content in the future. Given this reality, relying solely on AI detection tools can no longer meet the needs of academic review. Instead, a comprehensive review mechanism must be established, including author declarations of AI use and negative list management.
Shi also suggested that until AI technology matures, greater emphasis should be placed on authors’ originality declarations, with corresponding measures taken against those who use AI without disclosure. Academic journals must leverage editors’ proactive judgment, treating AI detection as an auxiliary rather than decisive tool. More efforts should be directed toward improving peer review and expert evaluation mechanisms, with a comprehensive assessment of a study’s originality and scholarship rather than mere textual norms.
Li proposed establishing a dynamic, human-AI integrated academic evaluation system encompassing the scholarly assessment process, AI detection tool development, and detection parameter configuration. Regarding parameter settings, discipline-specific similarity thresholds should be customized based on the distinctive traits of academic outputs across different fields.
Zhan argues that an obsessive focus on determining whether a text is AI-generated lacks long-term validity in both academic logic and practical application. Instead, he advocates for building a detection system centered on evaluating the innovativeness of academic work. By assessing the novelty of research questions, the uniqueness of methodologies, and the contribution of findings, we can accurately identify the value of academic creations and reshape the standards for evaluating academic quality.
Editor:Yu Hui
Copyright©2023 CSSN All Rights Reserved