Skip to content

fix: average reward over agents in compute_system_rewards#7

Open
Dreamstick9 wants to merge 1 commit into
SakanaAI:mainfrom
Dreamstick9:fix/compute-system-rewards-averaging
Open

fix: average reward over agents in compute_system_rewards#7
Dreamstick9 wants to merge 1 commit into
SakanaAI:mainfrom
Dreamstick9:fix/compute-system-rewards-averaging

Conversation

@Dreamstick9

Copy link
Copy Markdown

Fixes: #4

Bug: compute_system_rewards returns sum instead of average

Population.compute_system_rewards is supposed to return the average of per-agent morphology statistics, but always returns the raw sum, inflating every value by a factor of N (number of agents).

Root Cause

The averaging loop has two bugs:

  1. Wrong variable — divides agent_reward (local per-agent dict) instead of reward (accumulated total)
  2. Wrong scope — sits inside the for agent loop, so the result is overwritten and discarded on the next iteration
# before (buggy) — core/population.py  [header-1](#header-1)
reward = defaultdict(float)  
for agent in self.agents:  
    agent_reward = agent.compute_morphology_statistics()  
    for k, v in agent_reward.items():  
        reward[k] += v  
    # average  
    for k in agent_reward.keys():  
        agent_reward[k] = agent_reward[k] / len(self.agents)  # wrong variable, wrong scope

Fix

Move the averaging loop outside the for agent loop and apply it to reward:

# after (fixed)  [header-2](#header-2)
reward = defaultdict(float)  
for agent in self.agents:  
    agent_reward = agent.compute_morphology_statistics()  
    for k, v in agent_reward.items():  
        reward[k] += v  
# average  [header-3](#header-3)
for k in reward.keys():  
    reward[k] = reward[k] / len(self.agents)

Impact

All keys returned by compute_system_rewardsparadigms, stem_alternate_patterns, phonetic_non_confusability, stem_alternation_entropy, complexity, transfers, and the derived total — are N× too large in any run with more than one agent.

The existing test (test_compute_system_rewards_presence_and_bounds) only checks key presence and that values are finite, so this was not caught automatically.

The averaging loop was inside the for-agent loop and modified
agent_reward (the local per-agent dict) instead of reward
(the accumulated total). This meant reward always held the raw
sum of all agents' stats, inflating all values by N (num agents).

Fix: move the averaging loop outside the for-agent loop and
apply it to reward instead of agent_reward.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: correct averaging bug in compute_system_rewards

1 participant