DSR: Optimization of Performance Lower Bound for Hierarchical Policy with Dynamical Skill Refinement
en-GBde-DEes-ESfr-FR

DSR: Optimization of Performance Lower Bound for Hierarchical Policy with Dynamical Skill Refinement

01/07/2026 HEP Journals

Skill-based reinforcement learning has become the mainstream approach to solve sparse-rewards decision making tasks. The skills extracted from the demonstration datasets provide the temporal abstraction. However, in previous skill-based RL methods, the skills are kept fixed during online learning, which brings in sub-optimal asymptotic performance when the dataset contains only sub-optimal behavior modes.

To solve the problems, a research team led by Ying Wen published their new research on 15 June 2026 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.

This team proposed a skill-based RL method which fine-tune the entire hierarchical policy under a unified optimization objective via dynamical skill refinement mechanism. The method is verified and tested in multiple sparse-reward robotic manipulation tasks. Compared with SOTA methods, the proposed method achieves higher asymptotic performance and more stable performance improvement.

In this work, they propose to optimize the hierarchical policy’s performance in TA-MDP. They prove that the unified optimization objective guarantees the performance improvement in TA-MDP and essentially optimizes the performance lower bound in original MDP, which illustrates the effectiveness. They learn the skill refinement into a residual policy predicting dynamically weighted action increments, which avoids the skill space collapse. At the end of each epoch, the high-level policy and the low-level policy are simultaneously updated in an on-policy manner, which circumvents the temporal abstraction shift.

Specifically, the weight of action increment is dynamically determined according to the level of skill refinement in current state.This paper measures the refinement level through random network distillation (RND).

Future work can focus on finding more measures of skill refinement level. Moreover, finding a more compact performance lower bound is an important issue.
DOI
10.1007/s11704-025-50561-3
Fichiers joints
  • 59789874.png
01/07/2026 HEP Journals
Regions: Asia, China
Keywords: Applied science, Computing

Disclaimer: AlphaGalileo is not responsible for the accuracy of content posted to AlphaGalileo by contributing institutions or for the use of any information through the AlphaGalileo system.

Témoignages

We have used AlphaGalileo since its foundation but frankly we need it more than ever now to ensure our research news is heard across Europe, Asia and North America. As one of the UK’s leading research universities we want to continue to work with other outstanding researchers in Europe. AlphaGalileo helps us to continue to bring our research story to them and the rest of the world.
Peter Dunn, Director of Press and Media Relations at the University of Warwick
AlphaGalileo has helped us more than double our reach at SciDev.Net. The service has enabled our journalists around the world to reach the mainstream media with articles about the impact of science on people in low- and middle-income countries, leading to big increases in the number of SciDev.Net articles that have been republished.
Ben Deighton, SciDevNet
AlphaGalileo is a great source of global research news. I use it regularly.
Robert Lee Hotz, LA Times

Nous travaillons en étroite collaboration avec...


  • The Research Council of Norway
  • SciDevNet
  • Swiss National Science Foundation
  • iesResearch
Copyright 2026 by DNN Corp Terms Of Use Privacy Statement