A Deep Reinforcement Learning framework for training, evaluating, and analysing agents across four games. Supports multiple algorithms, configurable reward personas, difficulty presets, and a full metrics/graphing pipeline all driven through an interactive menu.
| Key | Game | Termination |
|---|---|---|
flappy |
Flappy Bird | Hits pipe or ground |
snake |
Snake | Wall/self collision, or 1-min apple inactivity timeout |
pong |
Pong | Ball goes out of bounds |
dk |
Donkey Kong | Hit by barrel / falls |
| Key | Algorithm |
|---|---|
ppo |
PPO |
rppo |
RecurrentPPO |
trpo |
TRPO |
a2c |
A2C |
python -m venv venv
venv\Scripts\activate
pip install -U pip
pip install -r requirements.txt
python menu.py1. Run Training
2. Run Evaluation (pick between normal or unlimited, pick game or all, pick difficulty)
3. View TensorBoard Logs
4. Play Game Manually (keyboard, pick difficulty)
5. Show Project Status
6. Run Random Baseline (pick game or all, pick difficulty)
7. Watch Trained Agent (pick game + model + difficulty)
8. Train All Models for One Game
9. Train Complete Grid
10. Metrics / Graphs
11. Record Gameplay (save MP4 or GIF)
12. Delete Logs / Models
13. Exit
DLR/
├── menu.py # Interactive menu (main entry point)
├── requirements.txt
├── pyproject.toml
├── games.md # Game-specific notes
├── tensorboard.md # TensorBoard usage
│
└── code/
├── conf/
│ ├── grid.yaml # Global training config (games, models, personas, skills)
│ ├── algo/ # Algorithm hyperparameters (ppo, rppo, trpo, a2c)
│ ├── game/ # Game configs (snake, flappy, pong, dk)
│ ├── reward/ # Persona reward configs
│ │ ├── snake_baseline.yaml
│ │ ├── flappy_baseline.yaml
│ │ ├── pong_baseline.yaml
│ │ └── dk_baseline.yaml
│ ├── robustness/ # Difficulty overrides (easy / default / hard per game)
│ └── callback/
│
├── games/
│ ├── snake_core.py
│ ├── flappy_core.py
│ ├── pong_core.py
│ └── dk_core.py
│
├── rewards/ # Reward function implementations
├── metrics/ # Per-game metrics collectors
│ ├── snake_balance.py
│ ├── flappy_balance.py
│ ├── pong_balance.py
│ └── dk_balance.py
│
├── wrappers/
│ └── generic_env.py # Universal Gym wrapper (reward fn, HUD, dt, apple timeout)
│
└── scripts/
├── train.py
├── evaluate.py
├── manual_play.py
├── watch_agent.py
├── random_eval.py
├── record_gameplay.py
├── analyze_metrics.py
├── metrics_utils.py
├── callbacks.py
└── play.py
The central config for grid-based experiments. Key fields:
games: [flappy, snake, pong, dk]
models: [ppo, rppo, trpo, a2c]
personas: [flappy_baseline, snake_baseline, pong_baseline, dk_baseline]
seed: 1234
device: cpu
n_envs: 10
skills:
Custom: 10000000 # timesteps; override with +skills.Custom=NEach game has three difficulty configs in code/conf/robustness/:
<game>_default.yamltraining conditions<game>_easy.yamlforgiving settings<game>_hard.yamlpunishing settings
These override game parameters (grid size, speed, penalties, etc.) at evaluation and manual play time.
Live in code/conf/reward/, named <game>_<persona>.yaml. They control reward shaping via a pluggable reward function passed into GameEnv.
Live in code/conf/algo/. Each file sets hyperparameters for its algorithm (learning rate, n_steps, batch size, etc.).
Via menu (option 1, 8, or 9), or directly:
python -m code.scripts.train game=snake model=ppo persona=snake_baseline skill=Custom +skills.Custom=5000000Outputs:
- Best model:
models/best/<game>_<algo>_<persona>_<skill>/best_model.zip - Checkpoints:
models/checkpoints/ - TensorBoard logs:
mylogs/ - Eval logs:
models/eval_logs/
Via menu (option 2), or directly:
python -m code.scripts.evaluate --game snake --algo ppo --difficulty default --episodes 100Outputs a CSV: <game>_<algo>_eval.csv
Columns: episode, game, algo, difficulty, score, apples, pipes, episode_return, episode_steps, terminated, truncated, forced_timeout
Difficulty options: easy, default, hard, all
Controls:
Snake: W/Up, S/Down, A/Left, D/Right, ESC
Flappy: SPACE = flap, ESC
Pong: W/Up = up, S/Down = down, ESC
DK: W/A/S/D = move, SPACE = jump, ESC
Renders a trained model playing in real time. Picks game, model, and difficulty from the menu.
Runs a random-action agent for comparison. Same CSV format as evaluation.
Saves an MP4 or GIF of a trained agent or manual play session. Output goes to outputs/.
Generates plots from training and evaluation CSVs. All outputs written to models/metrics/.
Available graphs:
- Reward vs timesteps (training curves)
- Score (apples / pipes) vs timesteps
- Evaluation bar charts comparing algorithms
- Create
code/conf/reward/<game>_<persona>.yaml - Implement reward logic in
code/rewards/ - Add persona name to
grid.yaml
- Implement
code/games/<game>_core.py(must exposeget_action_space,get_observation_space,reset,step,render) - Add
code/conf/game/<game>.yamlwith_target_pointing to the core class - Add robustness configs:
code/conf/robustness/<game>_default/easy/hard.yaml - Add a persona config and metrics collector
- Register the game in
grid.yaml
games.mdper-game design notestensorboard.mdhow to launch and read TensorBoard