AI evaluates texts without bias—until source is revealed
en-GBde-DEes-ESfr-FR

AI evaluates texts without bias—until source is revealed


Large Language Models (LLMs) are increasingly used not only to generate content but also to evaluate it. They are asked to grade essays, moderate social media content, summarize reports, screen job applications and much more.

However, there are heated discussions—in the media as well as in academia—whether such evaluations are consistent and unbiased. Some LLMs are under suspicion to promote certain political agendas: For example, Deepseek is often characterized as having a pro-Chinese perspective and Open AI as being “woke”.
Although these beliefs are widely discussed, they are so far unsubstantiated. UZH-researchers Federico Germani and Giovanni Spitale have now investigated whether LLMs really exhibit systematic biases when evaluating texts. The results show that LLMs deliver indeed biased judgements—but only when information about the source or author of the evaluated message is revealed.

LLM judgement put to the test
The researchers included four widely used LLMs in their study: OpenAI o3-mini, Deepseek Reasoner, xAI Grok 2, and Mistral. First, they tasked each of the LLMs to create fifty narrative statements about 24 controversial topics, such as vaccination mandates, geopolitics, or climate change policies.
Then they asked the LLMs to evaluate all the texts under different conditions: Sometimes no source for the statement was provided, sometimes it was attributed to a human of a certain nationality or another LLM. This resulted in a total of 192’000 assessments that were then analysed for bias and agreement between the different (or the same) LLMs.

The good news: When no information about the source of the text was provided, the evaluations of all four LLMs showed a high level of agreement, over ninety percent. This was true across all topics. “There is no LLM war of ideologies,” concludes Spitale. “The danger of AI nationalism is currently overhyped in the media.”
Neutrality dissolves when source is added
However, the picture changed completely when fictional sources of the texts were provided to the LLMs. Then suddenly a deep, hidden bias was revealed. The agreement between the LLM systems was substantially reduced and sometimes disappeared completely, even if the text stayed exactly the same.
Most striking was a strong anti-Chinese bias across all models, including China’s own Deepseek. The agreement with the content of the text dropped sharply when “a person from China” was (falsely) revealed as the author. “This less favourable judgement emerged even when the argument was logical and well-written,” says Germani. For example: In geopolitical topics like Taiwan’s sovereignty, Deepseek reduced agreement by up to 75 percent simply because it expected a Chinese person to hold a different view.

Also surprising: It turned out that LLMs trusted humans more than other LLMs. Most models scored their agreements with arguments slightly lower when they believed the texts were written by another AI. “This suggests a built-in distrust of machine-generated content,“ says Spitale.

More transparency urgently needed
Altogether, the findings show that AI doesn’t just process content if asked to evaluate a text. It also reacts strongly to the identity of the author or the source. Even small cues like the nationality of the author can push the LLMs toward biased reasoning. Germani and Spitale argue that this could lead to serious problems if AI is used for content moderation, hiring, academic reviewing, or journalism. The danger of LLMs isn’t that they are trained to promote political ideology; it is this hidden bias.

“AI will replicate such harmful assumptions unless we build transparency and governance into how it evaluates information”, says Spitale. This has to be done before AI is used in sensitive social or political contexts. The results don’t mean people should avoid AI, but they should not trust it blindly. “LLMs are safest when they are used to assist reasoning, rather than to replace it: useful assistants, but never judges.”

How to avoid LLM evaluation bias
1. Make the LLM identity blind: Remove all identity information regarding author and source of the text, e. g. avoid using phrases like “written by a person from X / by model Y” in the prompt.

2. Check from different angles: Run the same questions twice, e. g. with and without a source mentioned in the prompt. If results change you’ve likely hit a bias. Or cross-check with a second LLM model: If divergence appears when you add a source that is a red flag.

3. Force the focus away from the sources: Structured criteria help anchor the model in content rather than identity. Use this prompt, for example: “Score this using a 4-point rubric (evidence, logic, clarity, counter-arguments), and explain each score briefly.”

4. Keep humans in the loop: Treat the model as a drafting help and add a human review to the process—especially if an evaluation affects people.

Literature
Federico Germani, Giovanni Spitale. Source Framing triggers systematic bias in Large Language Models. Sciences Advances. 7 November 2025. DOI: 10.1126/sciadv.adz2924

Contact
Giovanni Spitale, PhD
Institute of Biomedical Ethics and History of Medicine
University of Zurich
Phone +39 348 5478209
E-mail: giovanni.spitale@ibme.uzh.ch

Federico Germani, Giovanni Spitale. Source framing triggers systematic bias in large language models. Sciences Advances. 7 November 2025. DOI: 10.1126/sciadv.adz2924
Regions: Europe, Switzerland, Asia, China, Taiwan
Keywords: Applied science, Artificial Intelligence, Computing

Disclaimer: AlphaGalileo is not responsible for the accuracy of content posted to AlphaGalileo by contributing institutions or for the use of any information through the AlphaGalileo system.

Testimonials

For well over a decade, in my capacity as a researcher, broadcaster, and producer, I have relied heavily on Alphagalileo.
All of my work trips have been planned around stories that I've found on this site.
The under embargo section allows us to plan ahead and the news releases enable us to find key experts.
Going through the tailored daily updates is the best way to start the day. It's such a critical service for me and many of my colleagues.
Koula Bouloukos, Senior manager, Editorial & Production Underknown
We have used AlphaGalileo since its foundation but frankly we need it more than ever now to ensure our research news is heard across Europe, Asia and North America. As one of the UK’s leading research universities we want to continue to work with other outstanding researchers in Europe. AlphaGalileo helps us to continue to bring our research story to them and the rest of the world.
Peter Dunn, Director of Press and Media Relations at the University of Warwick
AlphaGalileo has helped us more than double our reach at SciDev.Net. The service has enabled our journalists around the world to reach the mainstream media with articles about the impact of science on people in low- and middle-income countries, leading to big increases in the number of SciDev.Net articles that have been republished.
Ben Deighton, SciDevNet

We Work Closely With...


  • e
  • The Research Council of Norway
  • SciDevNet
  • Swiss National Science Foundation
  • iesResearch
Copyright 2025 by AlphaGalileo Terms Of Use Privacy Statement