Friend or Foe? The Gap Between Human and AI Social Intention Perception
en-GBde-DEes-ESfr-FR

Friend or Foe? The Gap Between Human and AI Social Intention Perception

28/05/2026 Tohoku University

Imagine a figure approaching in the distance. Before seeing their face or hearing their voice, you must instantly decide: friend or threat? While humans effortlessly read subtle body language to make this survival instinct, artificial intelligence (AI) continues to struggle. Historically, AI has focused on recognizing basic emotions (like happiness) or physical actions (like walking), ignoring social intention - the social signals directed at others. For a service robot or AI agent, knowing whether a person poses a threat is far more important than simply identifying their emotion.

Now, researchers have established a new benchmark for "embodied social intention," uncovering how we signal threats and revealing a critical "alignment gap" between human cognition and AI.

To study how humans communicate these signals, researchers at Tohoku University recorded 160 motion-capture performances from 80 performers from Japan and Taiwan. The performers conveyed friendly or hostile intentions to an "imaginary alien" who had just landed on earth and possessed no knowledge of human culture or language, forcing the performers to rely purely on non-verbal body language.

Some of the common friendly actions conveyed to the alien included bending to show politeness and humbleness and opening arms to show an open-body greeting. For hostile interactions, the performers used threatening behaviours such as throwing objects to drive the alien away.

The researchers also employed the help of 77 observers from Japan, Taiwan, and China who watched all 160 videos, judging whether they found the videos friendly or hostile. Interestingly, Taiwanese performers tended to use big, forceful movements to show their hostility. Their fast motions that contained a lot of physical power made their hostile interactions easily intelligible for all viewers. However, Japanese performances were different.

Their hostile movements were smaller and more controlled - containing ten times less motion energy than Taiwanese clips. Japanese viewers picked up on these subtle signals significantly higher (76% accuracy) than Taiwanese and Chinese viewers (69% and 65%).The researchers also employed the help of 77 observers from Japan, Taiwan, and China who watched all 160 videos, judging whether they found the videos friendly or hostile. Interestingly, Taiwanese performers tended to use big, forceful movements to show their hostility. Their fast motions that contained a lot of physical power made their hostile interactions easily intelligible for all viewers. However, Japanese performances were different.

Their hostile movements were smaller and more controlled - containing ten times less motion energy than Taiwanese clips. Japanese viewers picked up on these subtle signals significantly higher (76% accuracy) than Taiwanese and Chinese viewers (69% and 65%).

When testing an AI model (ST-GCN), researchers found a critical blind spot. Although the AI achieved 69% accuracy, it still did not 'think' like a human (Figure 2). Human observers across three cultures (Figure 3) showed high agreement with one another (correlations of over 0.79), however the AI's judgments barely aligned with human perception (a correlation of just 0.26). Humans use cognitive "inverse planning" to infer the hidden mental goals behind an action. The AI, however, merely matched physical patterns, failing to register the heavy social meaning behind subtle, passive-aggressive motions. For example, someone standing very still, arms crossed tight, body turned slightly away. The AI sees almost no motion and treats it as harmless. A human reads it instantly as "back off." Simply put, the movements that confused human observers were completely different from the ones that confused the AI.

This "alignment gap" presents a safety risk for human-machine interaction. A system that correctly classifies high-energy threats but remains blind to low-energy hostility may fail to de-escalate subtle conflicts. Bridging this gap will require AI that is not only accurate, but also perceptually aligned with human social cognition -capable of interpreting not just how people move, but what those movements mean.
Title: Friend or Foe? Benchmarking Human Perception and ST-GCN Decoding of Embodied Social Intention

Authors: Miao Cheng, Zhan Dai, Victor Schneider, Kanta Ozawa, Yangyang Cai, Ken Fujiwara, Yoshifumi Kitamura, Chia-huei Tseng

Conference: 2026 International Conference on Automatic Face and Gesture Recognition (FG)
Attached files
  • This video captures the performers' friendly body movements. ©Tohoku University
  • This video captures the performer's hostile body movements ©Tohoku University
28/05/2026 Tohoku University
Regions: Asia, Japan, China, Taiwan
Keywords: Applied science, Artificial Intelligence, Technology, Society, Psychology

Disclaimer: AlphaGalileo is not responsible for the accuracy of content posted to AlphaGalileo by contributing institutions or for the use of any information through the AlphaGalileo system.

Testimonials

For well over a decade, in my capacity as a researcher, broadcaster, and producer, I have relied heavily on Alphagalileo.
All of my work trips have been planned around stories that I've found on this site.
The under embargo section allows us to plan ahead and the news releases enable us to find key experts.
Going through the tailored daily updates is the best way to start the day. It's such a critical service for me and many of my colleagues.
Koula Bouloukos, Senior manager, Editorial & Production Underknown
We have used AlphaGalileo since its foundation but frankly we need it more than ever now to ensure our research news is heard across Europe, Asia and North America. As one of the UK’s leading research universities we want to continue to work with other outstanding researchers in Europe. AlphaGalileo helps us to continue to bring our research story to them and the rest of the world.
Peter Dunn, Director of Press and Media Relations at the University of Warwick
AlphaGalileo has helped us more than double our reach at SciDev.Net. The service has enabled our journalists around the world to reach the mainstream media with articles about the impact of science on people in low- and middle-income countries, leading to big increases in the number of SciDev.Net articles that have been republished.
Ben Deighton, SciDevNet

We Work Closely With...


  • The Research Council of Norway
  • SciDevNet
  • Swiss National Science Foundation
  • iesResearch
Copyright 2026 by AlphaGalileo Terms Of Use Privacy Statement