Publication

Average Reward Reinforcement Learning for Wireless Radio Resource Management

Publication Info

Publication

IEEE

Abstract

In this paper, we address a crucial but often overlooked issue in applying reinforcement learning (RL) to radio resource management (RRM) in wireless communications: the mismatch between the discounted reward RL formulation and the undiscounted goal of wireless network optimization. To the best of our knowledge, we are the first to systematically investigate this discrepancy, starting with a discussion of the problem formulation followed by simulations that quantify the extent of the gap. To bridge this gap, we introduce the use of average reward RL, a method that aligns more closely with the longterm objectives of RRM. We propose a new method called the Average Reward Off-policy Soft Actor-Critic (ARO-SAC), which is an adaptation of the well-known Soft Actor-Critic algorithm in the average reward framework. This new method achieves significant performance improvement – our simulation results demonstrate a 15% gain in the system performance over the traditional discounted reward RL approach, underscoring the potential of average reward RL in enhancing the efficiency and effectiveness of wireless network optimization.

CiTation

K. Yang, J. Yang and C. Shen, "Average Reward Reinforcement Learning for Wireless Radio Resource Management," 2024 58th Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, 2024, pp. 1188-1193, doi: 10.1109/IEEECONF60004.2024.10942648.

Contributors

Info

Date:
April 4, 2025
Type:
Conference Paper
DOI:
10.1109/IEEECONF60004.2024.10942648