Embodied Interactive Intelligence Towards Autonomous Driving

Researchers have developed an end-to-end autonomous driving system that demonstrates advanced capabilities in understanding and responding to complex social interactions on the road. The system, termed embodied interactive intelligence towards autonomous driving (EIIAD), addresses fundamental challenges in how autonomous vehicles comprehend human behaviors and align with human intentions during traffic encounters.

The research, published in Engineering, introduces a unified constrained vehicle environment interaction (UniCVE) model that processes multimodal sensory inputs to generate socially compatible driving behaviors. The system has been deployed on Dongfeng autonomous buses, which have successfully traveled 22 000 kilometers and completed 45 000 navigation tasks in Xiong’an New Area, China.

At the core of the framework lies a perception–cognition–behavior closed-loop feedback paradigm. The perception module employs a hypergraph neural network based on multiview spatiotemporal features (HGNN-MSTF) to recognize human actions and gestures. This model constructs high-order semantic associations of human joints, enhancing accuracy under challenging conditions such as occlusion and varying lighting. For vehicle-to-vehicle interactions, the researchers developed a joint trajectory prediction world model within a deep reinforcement learning framework (JTPWM-DRL), which predicts the driving trajectories of autonomous vehicles and surrounding vehicles to assess potential collision risks.

The cognition module integrates these perception capabilities with traffic rule recognition and large language model guidance. Human knowledge is distilled into the real-time system through tailored reward functions, enabling the vehicle to interpret complex scenarios such as pedestrians waving at bus stops versus zebra crossings. The behavior module then generates optimal driving policies by unifying multiple objectives—including safety requirements, traffic regulations, and social compatibility—within a reinforcement learning actor-critic architecture.

Experimental validation on the CARLA simulation platform demonstrated the system’s performance across multiple interaction scenarios. In vehicle-to-human interaction tests, the HGNN-MSTF achieved accuracies of 91.2% on cross-performer evaluation and 96.5% on cross-view evaluation using the NTU-RGB+D dataset. The model reduces parameters by 81.6% compared with Dynamic GCN models while maintaining comparable performance.

For vehicle-to-vehicle interactions, the researchers evaluated three complex scenarios: narrow road encounters, overtaking and lane changes, and unprotected left turns against oncoming traffic. The system utilizes time-to-intersection and time-headway metrics to monitor interaction states and calculate right-of-way. In narrow road encounters, the autonomous bus demonstrates cooperative yielding behavior, adjusting velocity profiles to allow oncoming vehicles to pass safely. During overtaking maneuvers, the system continuously calculates the right of way and adopts efficient strategies to perform reasonable lane changes.

Comparative experiments against state-of-the-art algorithms including InterFuser and LAV showed that the UniCVE model achieved a route completion rate of 100% and a driving score of 83 in comprehensive navigation tasks—exceeding LAV and InterFuser by 60.8% and 90.9% respectively. The system successfully avoided collisions while maintaining low variance across different experimental conditions.

A user study involving 40 participants evaluated the system’s interaction intelligence across five metrics: interpretability, safety, smartness, proficiency, and human-likeness. UniCVE achieved average scores of 7.67 for interpretability and 7.02 for safety, representing improvements of 34.2% and 198.7% over LAV and InterFuser respectively. Notably, the system maintained consistent performance across experienced drivers and non-drivers, with only 1.5% variance in overall ratings.

The researchers acknowledge certain limitations in edge case scenarios, particularly at intersections with significant visual occlusions where pedestrians may suddenly emerge from behind parked vehicles. However, they observed that the model demonstrates memory-based adaptive learning through repeated exposure to similar patterns, developing enhanced anticipatory behaviors in familiar high-risk areas.

Future work will focus on enhancing occlusion-aware perception, incorporating uncertainty-aware trajectory prediction, and strengthening memory-based modules to better anticipate latent risks across diverse intersection geometries.

The paper “Embodied Interactive Intelligence Towards Autonomous Driving” is authored by Nan Ma, Jia Pan, Yongjin Liu, Yajue Yang, Yiheng Han, Jiacheng Guo, Zhixuan Wu, Zecheng Yang, Zhiwei Yang, Deyi Li. Full text of the open access paper: https://doi.org/10.1016/j.eng.2025.09.032. For more information about Engineering, visit the website at https://www.sciencedirect.com/journal/engineering.

Embodied Interactive Intelligence Towards Autonomous Driving
Author: Nan Ma,Jia Pan,Yongjin Liu,Yajue Yang,Yiheng Han,Jiacheng Guo,Zhixuan Wu,Zecheng Yang,Zhiwei Yang,Deyi Li
Publication: Engineering
Publisher: Elsevier
Date: Available online 3 December 2025

25/03/2026 Frontiers Journals

Regions: Asia, China

Keywords: Applied science, Engineering, Technology, Transport

Disclaimer: AlphaGalileo is not responsible for the accuracy of content posted to AlphaGalileo by contributing institutions or for the use of any information through the AlphaGalileo system.

Latest Publications

Testimonials