Efficient Methods for Deep Reinforcement Learning: Algorithms and Applications
Doctor of Philosophy
Deep reinforcement learning (deep RL) has recently achieved remarkable success in various domains, from simulated games to real-world applications. However, deep RL agents are notoriously sample-inefficient; they often need to collect a large number of samples from the environment to achieve a reasonable performance. This sample efficiency issue becomes more pronounced in sparse reward environments, where the rewards are zeros in most of the states so that the deep RL agents can barely learn. Unfortunately, collecting samples can be extremely expensive in many real-world applications; we may only be able to collect a very limited number of samples for training. The sample efficiency issue significantly hinders the applications of deep RL in the real world. To bridge this gap, this thesis makes several contributions to efficient deep RL. First, we propose a learning-based experience replay algorithm to improve the sample efficiency with better sample reuse. Second, we present an episode-level exploration strategy for efficient exploration in spare environments. Third, we investigate a real-world application of embedding table sharding and design an efficient training algorithm based on an estimated environment. Finally, we devise a more general framework by leveraging pre-trained models to improve efficiency and apply it to embedding table sharding. Putting all these together, our research could help build more efficient deep RL systems and facilitate their real-world deployment.
Reinforcement Learning; Machine Learning; Recommender Systems; Machine Learning Systems