Overcoming Exploration

Abstract

Deep Reinforcement Learning (DRL) has become a powerful model-free framework to learn optimal policies. However, in real-world navigation tasks, DRL methods often struggle with insufficient exploration, especially in clutter scenarios with sparse rewards or complex dynamics under system disturbances. To overcome this challenge, we bridge general graph-based motion planning with DRL, allowing RL agents to explore their environment comprehensively and achieve optimal performance. Specifically, we design a dense reward function based on a graph structure that spans the entire state space. This graph serves as a rich source of information, guiding the RL agent toward discovering optimal strategies. We validate our approach in challenging environments, demonstrating significant improvements in exploration efficiency and task success rates.

Key Ideas and Contributions

The key idea of our work is to integrate a graph-based structure with model-free reinforcement learning to enhance exploration in complex environments. This integration provides structured guidance for agents, addressing the common issue of poor exploration in cluttered or high-dimensional state spaces.

Contributions: We propose a novel graph-based framework that is compatible with a wide range of model-free reinforcement learning algorithms to improve exploration efficiency. Compared to prior methods, our approach enables more complete coverage of the environment’s state space, thus fully utilizing the strengths of model-free RL in learning optimal policies under unknown dynamics. We provide a theoretical guarantee that our exploration strategy preserves the original RL objective and accelerates convergence. Moreover, our framework allows agents to generalize across arbitrary initial states without retraining or policy modification, making it practical for real-world deployment across diverse scenarios.

Analysis of success rates
Path	Dynamic model	Baseline rate	Success rate
Feasible Only	Quadrotor	RRG	100%
RRT	100%
Binary	0%
Feasible Only	Quadrotor	RRG	100%
RRT	100%
Binary	0%
Contains Infeasible Regions	Vehicle	RRG	100%
RRT	0%
Binary	0%
Contains Infeasible Regions	Vehicle	RRG	100%
RRT	0%
Binary	0%

Analysis of success rates

Path

Dynamic model

Baseline rate

Success rate

Feasible Only

Quadrotor

RRG

100%

RRT

100%

Binary

Feasible Only

Quadrotor

RRG

100%

RRT

100%

Binary

Contains Infeasible Regions

Vehicle

RRG

100%

RRT

Binary

Contains Infeasible Regions

Vehicle

RRG

100%

RRT

Binary

BibTeX

@misc{luo2025bridgingdeepreinforcementlearning, title={Bridging Deep Reinforcement Learning and Motion Planning for Model-Free Navigation in Cluttered Environments}, author={Licheng Luo and Mingyu Cai}, year={2025}, eprint={2504.07283}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2504.07283}, }

Bridging Deep Reinforcement Learning and Motion Planning for Model-Free Navigation in Cluttered Environments

Abstract

Key Ideas and Contributions

Comparison Demonstration

BibTeX