Playing Atari with Deep Reinforcement Learning (PyTorch Reimplementation)

This repository contains a PyTorch implementation of the Deep Reinforcement Learning algorithm for playing Atari games. The code is based on the paper "Playing Atari with Deep Reinforcement Learning" by Mnih et al. (2013). The implementation is designed to be simple and easy to understand, while still being efficient and effective.

Installation

To install the required packages, you can use the following command:

pip install -r requirements.txt

Data / Enviorments

To run the code, you need to have gymnasium (the open source fork of OpenAI gym) and the Atari environments installed. You can install them using the following command:

pip install gymnasium ale_py

Usage

To train the agent on an Atari game, you can use the following command:

Replace <ENV_NAME> with the name of the Atari environment you want to train on (e.g., ALE/Pong-v5) and <NUM_FRAMES> with the number of frames you want to train for. Remember to include the ALE prefix for Atari environments.

python src/train.py --env <ENV_NAME> --num_frames <NUM_FRAMES>

To evaluate the agent, you can use the following command:

Replace <ENV_NAME> with the name of the Atari environment you want to evaluate on (e.g., ALE/Pong-v5), <NUM_EPISODES> with the number of episodes you want to evaluate for, and <RENDER_MODE> with the render mode (e.g., human or rgb_array).

python src/test_agent.py --env <ENV_NAME> --num_episodes <NUM_EPISODES> --render_mode <RENDER_MODE>

Directory Structure

├── src/
│   ├── agent.py
│   ├── config.py
│   ├── graph_results.py
│   ├── model.py
│   ├── test_all_agents.py
│   ├── test_agent.py
│   └── train_agent.py
├── logs_{ENV_NAME}/
│   ├── {AGENT_NAME}_model_final.pth
│   ├── results_{AGENT_NAME}.csv
│   ├── avg_q_value_vs_episode.png
│   ├── avg_reward_vs_episode.png
│   ├── reward_vs_episode.png
│   └── reward_log.csv
├── README.md
└──requirements.txt

Configuration

The configuration was adjusted so the agent can be trained on a single GPU or on Apple Silicon using Metal. The configuration file is located in src/config.py. You can modify the hyperparameters and other settings in this file to suit your needs.

Logging

The training and evaluation results are logged in the logs_{ENV_NAME} directory. The logs include the following files:

{AGENT_NAME}_model_final.pth: The final model weights of the trained agent.
results_{AGENT_NAME}.csv: The results of the training and evaluation, including average rewards and Q-values.
avg_q_value_vs_episode.png: A plot of the average Q-value vs. episode.
avg_reward_vs_episode.png: A plot of the average reward vs. episode.
reward_vs_episode.png: A plot of the reward vs. episode.
reward_log.csv: A CSV file containing the reward log for each episode.

Results

Method	B. Rider	Breakout	Enduro	Pong	Q*bert	Seaquest	S. Invaders
Random	354	1.2	0	-20.4	157	110	179
Sarsa	996	5.2	129	−19	614	665	271
Contingency	1743	6	159	−17	960	723	268
DQN	4092	168	470	20	1952	1705	581
DQN Small (Ours)	5208	40	502	13	3003	1480	416
Human	7456	31	368	−3	18900	28010	3690
Best	B. Rider	Breakout	Enduro	Pong	*Qbert**	Seaquest	S. Invaders
HNeat Best	3616	52	106	19	1800	920	1720
HNeat Pixel	1332	4	91	−16	1325	800	1145
DQN Best	5184	225	661	21	4500	1740	1075
DQN Small Best (Ours)	8896	77	973	19	4625	1660	750

Table 1: The upper table compares average total reward for various learning methods.

Why does our smaller DQN perform better in some environments?

The smaller DQN model outperforms the original implementation in certain environments due to its more focused replay buffer that prioritizes recent experiences. Its streamlined architecture enables faster convergence by reducing overfitting on specific game patterns, while modern implementation techniques provide subtle optimizations not present in the original paper. The reduced memory footprint creates a beneficial constraint that forces more efficient learning from fewer samples, demonstrating that larger models aren't always necessary for superior reinforcement learning performance.

Configs

# AGENT
self.BUFFER_SIZE = 30000 # Reduced from 1M to 30k

self.GAMMA = 0.99

self.EPSILON_START = 1.0
self.EPSILON_END = 0.05
self.EPSILON_DECAY_STEPS = 1000000  # 1M frames for linear decay

self.TARGET_FREQ_UPDATE = 10000  # Changed from 10K to 1K

# TRAIN_AGENT
self.NUM_FRAMES = 5000000 # Reduced from 50M to 5M

These Config changes reduced the expected time to train of ~363 hours to ~12 hours (per experiment) on a single GPU. The training time can be further reduced by using multiple GPUs or Apple Silicon with Metal.

Training Curves

Enduro-v5

The training curves show the average Q-value and average reward per episode over the training process. The agent learns to play the game by maximizing its reward through reinforcement learning. You can clearly see when the model performs well and when it doesn't. The average Q-value and average reward increase over time, indicating that the agent is learning to play the game effectively. The Q value plateaus after a certain number of episodes, indicating that the agent has learned to play the game well and is no longer improving significantly. The reward curve shows the total reward received by the agent in each episode, which also increases over time as the agent learns to play the game better.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
logs_BeamRider-v5		logs_BeamRider-v5
logs_Breakout-v5		logs_Breakout-v5
logs_Enduro-v5		logs_Enduro-v5
logs_Pong-v5		logs_Pong-v5
logs_Qbert-v5		logs_Qbert-v5
logs_Seaquest-v5		logs_Seaquest-v5
logs_SpaceInvaders-v5		logs_SpaceInvaders-v5
public		public
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Playing Atari with Deep Reinforcement Learning (PyTorch Reimplementation)

Installation

Data / Enviorments

Usage

Directory Structure

Configuration

Logging

Results

Configs

Training Curves

Enduro-v5

About

Uh oh!

Releases

Packages

Uh oh!

Languages

owenstrength/deep-q-atari-pytorch

Folders and files

Latest commit

History

Repository files navigation

Playing Atari with Deep Reinforcement Learning (PyTorch Reimplementation)

Installation

Data / Enviorments

Usage

Directory Structure

Configuration

Logging

Results

Configs

Training Curves

Enduro-v5

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages