Towards spatial computing: recent advances in multimodal natural interaction for XR headsets
en-GBde-DEes-ESfr-FR

Towards spatial computing: recent advances in multimodal natural interaction for XR headsets

23.01.2026 Frontiers Journals

Researchers have conducted the comprehensive review of recent advances in multimodal natural interaction techniques for Extended Reality (XR) headsets, revealing significant trends in spatial computing technologies. This timely review analyzes how recent breakthroughs in artificial intelligence (AI) and large language models (LLMs) are transforming how users interact with virtual environments, offering valuable insights for the future development of more natural, efficient, and immersive XR experiences.
A research team led by Feng Lu systematically reviewed 104 papers published since 2022 in six top venues and published their new review article on 15 December 2025 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.
With the widespread adoption of Extended Reality headsets like Microsoft HoloLens 2, Meta Quest 3, and Apple Vision Pro, spatial computing technologies are gaining increasing attention. Natural human-computer interaction is at the core of spatial computing, enabling users to interact with virtual elements through intuitive methods such as eye tracking, hand gestures, and voice commands.
The review classifies interactions based on application scenarios, operation types, and interaction modalities. Operation types are divided into seven categories, distinguishing between active interactions (where users input information) and passive interactions (where users receive feedback). Interaction modalities are explored across nine distinct types, ranging from unimodal interactions (gesture, gaze, speech, or tactile only) to various multimodal combinations.
Statistical analysis of the reviewed literature reveals significant trends. Hand gesture and eye gaze interactions, including their combined modalities, remain the most prevalent. However, there has been a notable increase in speech-related studies in 2024, likely driven by recent advancements in LLMs. Regarding operation types, pointing and selection remains the most focused area, although the number of studies has been decreasing annually, possibly due to the maturity of this research area. Conversely, research on locomotion, viewport control, typing, and querying has increased, reflecting growing attention on users' subjective experiences and the integration of LLMs.
The researchers also identified several challenges in current natural interaction techniques. For example, gesture-only interactions often require users to adapt to complex paradigms, which increases cognitive load. Eye gaze interactions face issues with the "Midas touch" problem, where users unintentionally select items they are merely looking at. Speech-based interactions struggle with latency and recognition accuracy.
Based on these findings, the research team suggests potential directions for future research, including:
  1. Developing more accurate and reliable natural interactions through multimodal integration and error recovery mechanisms
  2. Enhancing the naturalness, comfort, and immersion of XR interactions by reducing physical and cognitive load
  3. Leveraging AI and LLMs to enable more sophisticated, context-aware interactions
  4. Bridging interaction design and practical XR applications to encourage wider adoption
The paper includes detailed illustrations of various interaction techniques, such as gesture-based drawing, gaze vergence control, and LLM-based speech interactions, providing a valuable reference for researchers and practitioners in the field.
This review offers important insights for researchers designing natural and efficient interaction systems for XR, ultimately contributing to the advancement of spatial computing technologies that could transform how we interact with digital information in our daily lives.
DOI:10.1007/s11704-025-41123-8
Angehängte Dokumente
  • Taxonomy of Operation Types and Interaction Modalities
  • Application Scenarios of Multimodal Natural Interaction
23.01.2026 Frontiers Journals
Regions: Asia, China
Keywords: Applied science, Computing

Disclaimer: AlphaGalileo is not responsible for the accuracy of content posted to AlphaGalileo by contributing institutions or for the use of any information through the AlphaGalileo system.

Referenzen

We have used AlphaGalileo since its foundation but frankly we need it more than ever now to ensure our research news is heard across Europe, Asia and North America. As one of the UK’s leading research universities we want to continue to work with other outstanding researchers in Europe. AlphaGalileo helps us to continue to bring our research story to them and the rest of the world.
Peter Dunn, Director of Press and Media Relations at the University of Warwick
AlphaGalileo has helped us more than double our reach at SciDev.Net. The service has enabled our journalists around the world to reach the mainstream media with articles about the impact of science on people in low- and middle-income countries, leading to big increases in the number of SciDev.Net articles that have been republished.
Ben Deighton, SciDevNet
AlphaGalileo is a great source of global research news. I use it regularly.
Robert Lee Hotz, LA Times

Wir arbeiten eng zusammen mit...


  • e
  • The Research Council of Norway
  • SciDevNet
  • Swiss National Science Foundation
  • iesResearch
Copyright 2026 by DNN Corp Terms Of Use Privacy Statement