Summary
Population.compute_system_rewards always returns the raw sum of per-agent stats instead of the average, inflating every value by a factor of N (number of agents).
Bug
The averaging loop has two problems — it modifies agent_reward (the local per-agent dict) instead of reward (the accumulated total), and it sits inside the for agent loop so the result is discarded on the next iteration anyway.
# buggy [header-1](#header-1)
for agent in self.agents:
agent_reward = agent.compute_morphology_statistics()
for k, v in agent_reward.items():
reward[k] += v
# average — wrong variable, wrong scope
for k in agent_reward.keys():
agent_reward[k] = agent_reward[k] / len(self.agents)
Summary
Population.compute_system_rewardsalways returns the raw sum of per-agent stats instead of the average, inflating every value by a factor of N (number of agents).Bug
The averaging loop has two problems — it modifies
agent_reward(the local per-agent dict) instead ofreward(the accumulated total), and it sits inside thefor agentloop so the result is discarded on the next iteration anyway.