Offline Model-Based Reinforcement Learning with Causal Structured World Models
en-GBde-DEes-ESfr-FR

Offline Model-Based Reinforcement Learning with Causal Structured World Models

21.05.2025 Frontiers Journals

Model-based methods have recently been shown promising for offline reinforcement learning (RL), which aims at learning good policies from historical data without interacting with the environment.

Previous model-based offline RL methods employ a straightforward prediction method that maps the states and actions directly to the next-step states.

However, such a prediction method tends to capture spurious relations caused by the sampling policy preference behind the offline data.

It is sensible that the environment model should focus on causal influences, which can facilitate learning an effective policy that can generalize well to unseen states.

To solve the problems, a research team led by Yang Yu from LAMDA, Nanjing University published their new research on 15 Apr 2025 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.

The team first provides theoretical results that causal environment models can outperform plain environment models in offline RL by incorporating the causal structure into the generalization error bound. They also propose a practical algorithm, oFfline mOdel-based reinforcement learning with CaUsal Structured World Models (FOCUS), to illustrate the feasibility of learning and leveraging causal structure in offline RL.

Learning the causal structure from offline data, also known as causal discovery from observations, is a crucial phase of FOCUS. However, causal discovery from observations requires a huge number of hypothesis testing, which is computation-consuming. To tackle this problem, we utilize the time-series property in RL data to reduce the number of hypothesis testing. Specifically, we incorporate the constraint that the future cannot cause the past in the PC algorithm, which seeks to uncover causal relationships based on inferred conditional independence relations. Consequently, we can reduce the number of conditional independence tests and determine the causal direction. In addition, we employ kernel-based conditional independence tests, which can be applied to continuous variables without assuming a specific functional form between the variables or a particular data distribution.

Our experimental results validate the theoretical claims, showing that FOCUS outperforms baseline models and other existing causal MBRL algorithms in the offline setting.

DOI: 10.1007/s11704-024-3946-y

Zhengmao ZHU, Honglong TIAN, Xionghui CHEN, Kun ZHANG, Yang YU. Offline model-based reinforcement learning with causal structured world models. Front. Comput. Sci., 2025, 19(4): 194347, https://doi.org/10.1007/s11704-024-3946-y
Angehängte Dokumente
  • Figure 1. The architecture of FOCUS. Given offline data, FOCUS learns a $p$ value matrix by KCI test and then gets the causal structure by choosing a $p$ threshold. After combining the learned causal structure with the neural network, FOCUS learns the policy through an offline MBRL algorithm.
  • Figure 2. The basic, impossible and compound situations of the causation between target variables and condition variables. In the basic situations, Top Line: (a)-(d) list the situations that the condition variable is in the t time step. Bottom Line: Similarly, (e)-(h) list the situations that the condition variable is in the t+1 time step.
21.05.2025 Frontiers Journals
Regions: Asia, China
Keywords: Applied science, Computing

Disclaimer: AlphaGalileo is not responsible for the accuracy of content posted to AlphaGalileo by contributing institutions or for the use of any information through the AlphaGalileo system.

Referenzen

We have used AlphaGalileo since its foundation but frankly we need it more than ever now to ensure our research news is heard across Europe, Asia and North America. As one of the UK’s leading research universities we want to continue to work with other outstanding researchers in Europe. AlphaGalileo helps us to continue to bring our research story to them and the rest of the world.
Peter Dunn, Director of Press and Media Relations at the University of Warwick
AlphaGalileo has helped us more than double our reach at SciDev.Net. The service has enabled our journalists around the world to reach the mainstream media with articles about the impact of science on people in low- and middle-income countries, leading to big increases in the number of SciDev.Net articles that have been republished.
Ben Deighton, SciDevNet
AlphaGalileo is a great source of global research news. I use it regularly.
Robert Lee Hotz, LA Times

Wir arbeiten eng zusammen mit...


  • e
  • The Research Council of Norway
  • SciDevNet
  • Swiss National Science Foundation
  • iesResearch
Copyright 2025 by DNN Corp Terms Of Use Privacy Statement