The ‘chameleon’ AI: Addressing rise of sycophancy in scientific research- CHINESE SOCIAL SCIENCES NET

The ‘chameleon’ AI: Addressing rise of sycophancy in scientific research

Author:CHEN YAJING Source:Chinese Social Sciences Today 2026-03-20

Large Language Models have been deeply embedded in the core processes of knowledge production and scientific research. Photo: TUCHONG

With the widespread adoption of artificial intelligence (AI), its increasingly “chameleon-like” tendency to pander to users has drawn growing academic scrutiny. In a recent interview with CSST, Yang Yaodong, executive director of the Institute for Artificial Intelligence at Peking University (PKU), noted, “In interactions with humans, AI models often prioritize pleasing the user over remaining objective and truthful.”

‘Chameleon’ AI and ‘honey trap’ in scientific research

Today, as AI technology sweeps across the globe, Large Language Models (LLMs) have moved rapidly from conceptual development to widespread application, becoming deeply embedded in the core processes of knowledge production and scientific research. From sorting through massive volumes of literature and designing experimental plans to analyzing complex datasets and polishing academic language, AI assistants are enhancing the efficiency of scientific work at an unprecedented pace.

Yet alongside these transformative capabilities, a hidden but increasingly critical issue has begun to surface: AI sycophancy. Recent studies within the international academic community suggest that AI models display a systematic tendency toward flattery in human interactions. In their efforts to satisfy users, models may tailor responses to match user expectations, endorse erroneous viewpoints, and sometimes even compromise objectivity and truthfulness. Many scholars interviewed warn that such tendencies may be quietly permeating the research process, posing potential risks to the truth-seeking ethos and innovative ecosystem of scientific inquiry.

A recent study published on the arXiv preprint platform suggests that AI models exhibit a level of sycophantic behavior roughly 50% higher than that observed among humans. Meanwhile, the world’s first systematic international report on AI deception—released by PKU in collaboration with the Beijing Academy of Artificial Intelligence—warns that as AI systems become more capable, their methods of deception and sycophancy may grow increasingly sophisticated. Simpler models may merely replicate biases embedded in training data, whereas models with advanced reasoning capabilities can make deliberate strategic adjustments.

According to Yang, who led the team that spearheaded the release of the report, researchers observed a pronounced “chameleon” phenomenon: when users explicitly include preconceived positions in their prompts, the model tends to echo those views; when users exert pressure or inducement, the model may abandon accurate knowledge and instead align with incorrect or even erroneous viewpoints, sometimes even fabricating explanations to justify its response. This tendency appears across text-only models, multimodal systems, and even agent-based architectures. For example, vision–language models may alter or distort descriptions of image content depending on the prompts provided.

Yang Bo, a professor at the School of Information at Renmin University of China, has reached similar conclusions in his research. When users frame prompts around a preset stance, models may prioritize agreement with the user over the pursuit of factual accuracy. This tendency is already observable at scale. In certain contexts, sycophantic behavior becomes particularly pronounced. For example, in discussions involving moral dilemmas or value judgments, AI systems often align with the user’s emotional orientation; in subjective evaluative domains such as art criticism or comparisons among theoretical traditions, models may reinforce the questioner’s preferences; and when reasoning from user-supplied premises, models frequently follow the implied logic of those premises without critically evaluating their feasibility.

Rooted in data bias and safety constraints

The roots of these sycophantic tendencies lie deep within the training architecture of contemporary AI systems, particularly the widely used method known as Reinforcement Learning from Human Feedback (RLHF). In essence, RLHF rewards models for generating responses that humans judge to be “good,” which can lead to a misalignment between reward signals and the pursuit of objective truth. Xu Xiaoke, a professor at the School of Journalism and Communication at Beijing Normal University, explained that the goal of aligning models with human preferences encourages them to produce responses that users find satisfying or comfortable, rather than responses that are strictly neutral or cognitively challenging. In practice, this incentive structure can lead AI systems to anticipate and accommodate user viewpoints rather than carefully verifying complex facts.

Inherent bias within training data further reinforces the problem. Datasets used to train large models inevitably contain dominant viewpoints, cultural prejudices, and widely accepted conclusions, all of which the model internalizes during learning. When confronted with non-mainstream or challenging perspectives, the system may avoid or downplay them because such perspectives appear less frequently or carry lower weight in the training data.

At the same time, strict safety and compliance requirements impose additional constraints on large models. To avoid generating controversial content, many systems operate in a highly “cautious” mode. While well-intentioned, this design can push models toward overly mild, conservative, or mainstream-aligned responses. When addressing exploratory scientific questions for which no consensus yet exists, AI may therefore appear overly timid or insufficiently incisive.

Yang Yaodong’s report raises an additional concern: When punitive training is introduced to suppress obvious deceptive behavior, some advanced models do not necessarily become more honest. Instead, they may learn to adopt subtler strategies that allow deceptive responses to evade detection, making governance considerably more complex.

From confirmation bias to crisis of trust

The sycophantic tendencies of AI are subtly reshaping the scientific research ecosystem along several dimensions, with consequences that may extend far beyond occasional factual inaccuracies.

First, they may amplify researchers’ confirmation bias, encouraging premature convergence around research hypotheses. Yang Bo explained that when researchers present early ideas, affirming responses from AI systems can reinforce their confidence in the presumed correctness of those ideas. This in turn weakens the motivation to actively search for counterexamples, test competing explanations, or subject hypotheses to rigorous self-falsification.

Wen Jiangong, a research fellow at the Institute for Internet Industry at Tsinghua University, illustrated the issue with a personal anecdote: While conducting a literature review on an indicator evaluation system in a certain field, AI tools tend to prioritize widely recognized mainstream indicators while overlooking less prominent indicators that may be critical for in-depth or niche research. In forecasting tasks, the pattern can be equally problematic: if users begin with optimistic premises, AI systems tend to generate overly optimistic projections; if the premise is pessimistic, the output quickly shifts in the opposite direction. In either case, the system lacks stable and balanced judgment.

Second, the interplay between data bias and AI hallucinations can contaminate the academic chain of evidence. Zhou Ruiming, a professor at the School of Journalism and Information Communication at Huazhong University of Science and Technology, argues that AI systems may replicate and reinforce social biases embedded in training data—including gender and racial bias—undermining the fairness of outputs. More troubling still is the autoregressive generation mechanism of large models, which can produce coherent yet entirely fabricated information—including plausible but nonexistent references, experimental data, or even synthetic images.

Yang Yaodong warned that if highly capable AI systems are used to produce logically polished and formally rigorous fraudulent papers capable of passing peer review, the scientific community could find itself trapped in deep uncertainty about the authenticity of colleagues’ work, severely damaging both the efficiency and credibility of academic exchange.

Ultimately, the most profound danger lies in the erosion of the scientific system of trust. Li Zhanglyu, a professor at the Institute of Philosophy at the Chinese Academy of Social Sciences, believes that AI systems prone to sycophancy may act as “accelerators of confirmation bias,” helping researchers construct closed epistemic echo chambers. At the same time, they significantly raise the cost of verification within the research process. “Researchers have to invest significant additional effort to discern whether AI is ‘lying to please them,’” Li said. “This erosion of trust means that instead of effectively reducing cognitive load, AI actually increases the burden of identifying errors, posing a severe challenge to the rigor and efficiency of scientific research.”

Building defense and governance system

Faced with the emerging “sycophancy trap,” passivity is not an option. Developing a multi-layered system of defense and governance will be essential if human–AI collaborative research is to mature responsibly.

Strengthening researchers’ sense of responsibility and critical thinking constitutes the first line of defense. Scholars widely agree that academic accountability must be clearly defined: regardless of how credible AI-generated content may appear, the ultimate responsibility for verifying facts, validating reasoning, and checking sources always rests with the researcher. Yang Bo emphasizes the need to cultivate a “critical collaboration mindset,” in which AI is treated as a partner to be continuously questioned and tested rather than as an authoritative source.

In practice, a number of anti-sycophancy strategies have proven useful. In particular, Yang Bo recommends “decontextualized questioning,” in which researchers first ask the model for an unbiased baseline response before presenting their own position, then subject the resulting hypothesis to further scrutiny. Another technique involves adversarial role-playing—prompting the AI to assume the role of a skeptical reviewer or intellectual opponent. Wen described a broader “multi-verification strategy,” which includes carefully specifying keywords when posing questions, explicitly requesting both positive and negative or multi-party perspectives, rigorously checking cited literature and data sources, and cross-checking outputs across several large models when necessary.

At the level of technological development and model design, scholars emphasize the need to advance a paradigm shift from “preference alignment” toward “truth alignment.” Li argues that AI ethics frameworks should require models to incorporate mechanisms that resist suggestive questioning and clearly signal confidence levels for claims lacking sufficient evidence.

Wang Qiang, director of the Frontier Technology Research Center at Tencent Research Institute, explained that AI sycophancy ultimately stems from unintended consequences in the reward function design of reinforcement learning. With continued technological advances, he suggested, the problem can be significantly mitigated through optimization. As an example, he referred to the recently released DeepSeek-Math-V2 model, which rewards not only correct answers but also sound reasoning processes, reducing incentives for purely sycophantic outputs through improved mechanism design.

Editor：Yu Hui

close print