Multi-Game RL Framework

A Deep Reinforcement Learning framework for training, evaluating, and analysing agents across four games. Supports multiple algorithms, configurable reward personas, difficulty presets, and a full metrics/graphing pipeline all driven through an interactive menu.

Games

Key	Game	Termination
`flappy`	Flappy Bird	Hits pipe or ground
`snake`	Snake	Wall/self collision, or 1-min apple inactivity timeout
`pong`	Pong	Ball goes out of bounds
`dk`	Donkey Kong	Hit by barrel / falls

Algorithms

Key	Algorithm
`ppo`	PPO
`rppo`	RecurrentPPO
`trpo`	TRPO
`a2c`	A2C

Quick Start

python -m venv venv
venv\Scripts\activate      
pip install -U pip
pip install -r requirements.txt
python menu.py

Menu Options

1.  Run Training
2.  Run Evaluation          (pick between normal or unlimited, pick game or all, pick difficulty)
3.  View TensorBoard Logs
4.  Play Game Manually      (keyboard, pick difficulty)
5.  Show Project Status
6.  Run Random Baseline     (pick game or all, pick difficulty)
7.  Watch Trained Agent     (pick game + model + difficulty)
8.  Train All Models for One Game
9.  Train Complete Grid
10. Metrics / Graphs
11. Record Gameplay         (save MP4 or GIF)
12. Delete Logs / Models
13. Exit

Repository Structure

DLR/
├── menu.py                         # Interactive menu (main entry point)
├── requirements.txt
├── pyproject.toml
├── games.md                        # Game-specific notes
├── tensorboard.md                  # TensorBoard usage
│
└── code/
    ├── conf/
    │   ├── grid.yaml               # Global training config (games, models, personas, skills)
    │   ├── algo/                   # Algorithm hyperparameters (ppo, rppo, trpo, a2c)
    │   ├── game/                   # Game configs (snake, flappy, pong, dk)
    │   ├── reward/                 # Persona reward configs
    │   │   ├── snake_baseline.yaml
    │   │   ├── flappy_baseline.yaml
    │   │   ├── pong_baseline.yaml
    │   │   └── dk_baseline.yaml
    │   ├── robustness/             # Difficulty overrides (easy / default / hard per game)
    │   └── callback/
    │
    ├── games/
    │   ├── snake_core.py
    │   ├── flappy_core.py
    │   ├── pong_core.py
    │   └── dk_core.py
    │
    ├── rewards/                    # Reward function implementations
    ├── metrics/                    # Per-game metrics collectors
    │   ├── snake_balance.py
    │   ├── flappy_balance.py
    │   ├── pong_balance.py
    │   └── dk_balance.py
    │
    ├── wrappers/
    │   └── generic_env.py          # Universal Gym wrapper (reward fn, HUD, dt, apple timeout)
    │
    └── scripts/
        ├── train.py
        ├── evaluate.py
        ├── manual_play.py
        ├── watch_agent.py
        ├── random_eval.py
        ├── record_gameplay.py
        ├── analyze_metrics.py
        ├── metrics_utils.py
        ├── callbacks.py
        └── play.py

Configuration

`grid.yaml`

The central config for grid-based experiments. Key fields:

games:    [flappy, snake, pong, dk]
models:   [ppo, rppo, trpo, a2c]
personas: [flappy_baseline, snake_baseline, pong_baseline, dk_baseline]

seed: 1234
device: cpu
n_envs: 10

skills:
  Custom: 10000000    # timesteps; override with +skills.Custom=N

Difficulty / Robustness

Each game has three difficulty configs in code/conf/robustness/:

<game>_default.yaml training conditions
<game>_easy.yaml forgiving settings
<game>_hard.yaml punishing settings

These override game parameters (grid size, speed, penalties, etc.) at evaluation and manual play time.

Persona Configs

Live in code/conf/reward/, named <game>_<persona>.yaml. They control reward shaping via a pluggable reward function passed into GameEnv.

Algorithm Configs

Live in code/conf/algo/. Each file sets hyperparameters for its algorithm (learning rate, n_steps, batch size, etc.).

Training

Via menu (option 1, 8, or 9), or directly:

python -m code.scripts.train game=snake model=ppo persona=snake_baseline skill=Custom +skills.Custom=5000000

Outputs:

Best model: models/best/<game>_<algo>_<persona>_<skill>/best_model.zip
Checkpoints: models/checkpoints/
TensorBoard logs: mylogs/
Eval logs: models/eval_logs/

Evaluation

Via menu (option 2), or directly:

python -m code.scripts.evaluate --game snake --algo ppo --difficulty default --episodes 100

Outputs a CSV: <game>_<algo>_eval.csv

Columns: episode, game, algo, difficulty, score, apples, pipes, episode_return, episode_steps, terminated, truncated, forced_timeout

Difficulty options: easy, default, hard, all

Manual Play (option 4)

Controls:
  Snake:  W/Up, S/Down, A/Left, D/Right, ESC
  Flappy: SPACE = flap, ESC
  Pong:   W/Up = up, S/Down = down, ESC
  DK:     W/A/S/D = move, SPACE = jump, ESC

Watch Agent (option 7)

Renders a trained model playing in real time. Picks game, model, and difficulty from the menu.

Random Baseline (option 6)

Runs a random-action agent for comparison. Same CSV format as evaluation.

Record Gameplay (option 11)

Saves an MP4 or GIF of a trained agent or manual play session. Output goes to outputs/.

Metrics and Graphs (option 10)

Generates plots from training and evaluation CSVs. All outputs written to models/metrics/.

Available graphs:

Reward vs timesteps (training curves)
Score (apples / pipes) vs timesteps
Evaluation bar charts comparing algorithms

Extending the Project

Add a Persona

Create code/conf/reward/<game>_<persona>.yaml
Implement reward logic in code/rewards/
Add persona name to grid.yaml

Add a Game

Implement code/games/<game>_core.py (must expose get_action_space, get_observation_space, reset, step, render)
Add code/conf/game/<game>.yaml with _target_ pointing to the core class
Add robustness configs: code/conf/robustness/<game>_default/easy/hard.yaml
Add a persona config and metrics collector
Register the game in grid.yaml

Additional Docs

games.md per-game design notes
tensorboard.md how to launch and read TensorBoard

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Game RL Framework

Games

Algorithms

Quick Start

Menu Options

Repository Structure

Configuration

`grid.yaml`

Difficulty / Robustness

Persona Configs

Algorithm Configs

Training

Evaluation

Manual Play (option 4)

Watch Agent (option 7)

Random Baseline (option 6)

Record Gameplay (option 11)

Metrics and Graphs (option 10)

Extending the Project

Add a Persona

Add a Game

Additional Docs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
.vscode		.vscode
code		code
models		models
.gitignore		.gitignore
README.md		README.md
chime.wav		chime.wav
games.md		games.md
menu.py		menu.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
tensorboard.md		tensorboard.md

Folders and files

Latest commit

History

Repository files navigation

Multi-Game RL Framework

Games

Algorithms

Quick Start

Menu Options

Repository Structure

Configuration

grid.yaml

Difficulty / Robustness

Persona Configs

Algorithm Configs

Training

Evaluation

Manual Play (option 4)

Watch Agent (option 7)

Random Baseline (option 6)

Record Gameplay (option 11)

Metrics and Graphs (option 10)

Extending the Project

Add a Persona

Add a Game

Additional Docs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`grid.yaml`

Packages