LLM Creative Coding Challenges

Powered by Stock42

This repository is an open benchmark collection designed to test how different LLM providers, coding agents, and models solve the same creative coding challenges.

The goal is to generate real, comparable results that help the developer community make better technical decisions when choosing LLMs, agents, and AI-assisted development workflows.

This is not a synthetic leaderboard.

This is a practical repository of real outputs generated from real prompts.

Purpose

LLMs are changing extremely fast. New models, agents, IDE integrations, CLI tools, and coding workflows appear constantly.

This repository exists to answer practical questions:

Which models generate better working code?
Which agents produce cleaner implementations?
Which LLMs follow constraints more reliably?
Which tools create better visual and interactive results?
Which models require fewer manual fixes?
Which outputs are actually useful for developers?

Each challenge in this repository is based on a fixed prompt.

Anyone can run the same prompt using a different agent or LLM and submit the result.

Repository Structure

The repository is organized into two main areas:

.
├── promptings/
│   ├── triple-pendulum.md
│   ├── 1kg-block-bounces.md
│   ├── balls-fall.md
│   └── tetris-3d.md
├── results/
│   └── $userNameGithub/
│       └── $agentName/
│           └── $llm/
│               └── $challengeName/
│                   ├── index.html
│                   └── README.md
└── README.md

Promptings

All benchmark prompts live inside the promptings directory.

Each prompting is a standalone challenge file.

Current promptings:

Challenge	Prompt File	Description
Triple Pendulum	`promptings/triple-pendulum.md`	A triple pendulum swings into chaos and paints glowing trails with its tip.
1kg Block Bounces	`promptings/1kg-block-bounces.md`	A 1 kg block bounces between a wall and a heavy block while collisions reveal pi-related counts.
Balls Fall	`promptings/balls-fall.md`	Balls fall through a Galton board and accumulate into a bell-curve distribution.
Tetris 3D	`promptings/tetris-3d.md`	A playable 3D-style Tetris game with generated pixel sound, explosions, levels, and localStorage scoring.

More promptings will be added over time.

The objective is to build a large collection of practical LLM coding challenges.

Results Structure

Every submitted result must follow this path pattern:

results/$userNameGithub/$agentName/$llm/$challengeName/

Example:

results/cesarcasas/opencode/deepseek-v4-pro/triple-pendulum/

Inside each result folder, include at least:

index.html
README.md

Example:

results/cesarcasas/opencode/deepseek-v4-pro/triple-pendulum/
├── index.html
└── README.md

The index.html file must contain the generated solution.

For single-file HTML challenges, the generated output must remain a single self-contained HTML file.

Naming Rules

Use lowercase folder names.

Use hyphens instead of spaces.

Recommended format:

results/github-username/agent-name/model-name/challenge-name/

Valid examples:

results/cesarcasas/opencode/deepseek-v4-pro/triple-pendulum/
results/devexample/cursor/claude-sonnet-4/triple-pendulum/
results/janedoe/chatgpt/gpt-5.5-thinking/triple-pendulum/
results/alexdev/windsurf/qwen3-coder/triple-pendulum/

Invalid examples:

results/opencode/deepseek-v4-pro/
results/cesarcasas/deepseek-v4-pro/triple-pendulum/
results/cesarcasas/OpenCode/DeepSeek V4 Pro/triple-pendulum/
results/cesarcasas/opencode/deepseek-v4-pro/

The complete structure must always include:

results / GitHub username / agent name / LLM name / challenge name

Current Challenges

Available challenges:

triple-pendulum
1kg-block-bounces
balls-fall
tetris-3d

Prompt files:

promptings/triple-pendulum.md
promptings/1kg-block-bounces.md
promptings/balls-fall.md
promptings/tetris-3d.md

Challenge objectives:

A triple pendulum swings into chaos and paints glowing trails with its tip.

A 1 kg block bounces between a wall and a 100,000 kg block, with elastic collisions counted and interpreted honestly against pi.

Balls fall through a grid of pegs and pile into bins, forming a bell curve with live histogram statistics.

A playable 3D-style Tetris game with pixel sound effects, explosive line clears, 10 levels, and persistent localStorage scoring.

The expected output is a single self-contained HTML file using:

HTML
CSS
Vanilla JavaScript
Canvas

No external libraries, frameworks, CDNs, or assets are allowed.

Local Result README

Each submitted result must include a local README.md inside its result folder.

Use this template:

# Test Result

## Challenge

triple-pendulum

## Contributor

GitHub username: your-github-username

## Agent

agent-name

## LLM

model-name

## Prompt File

promptings/triple-pendulum.md

## Prompt Version

v1

## Date

YYYY-MM-DD

## Generation Process

Generated in one shot.

Or:

Generated after multiple iterations.

## Manual Changes

No manual changes.

Or list the changes:

- Fixed a syntax error.
- Adjusted canvas resize behavior.
- Improved button event handling.

## Notes

Short notes about the quality of the result, issues found, visual quality, physics quality, or performance.

Contribution Rules

When submitting a result:

Use an existing prompt from the promptings directory.
Do not modify the original prompt.
Save the generated result under the required results/$userNameGithub/$agentName/$llm/$challengeName/ structure.
Include the generated output file.
Include a local result README.md.
Document whether the result was generated in one shot or after multiple iterations.
Document any manual changes.
Do not submit private API keys, provider tokens, or credentials.
Do not overwrite another contributor’s result.
Do not rename existing result folders unless fixing a naming convention issue.

Manual Fix Policy

The preferred benchmark mode is:

zero manual changes

However, manual fixes are allowed if they are documented.

Manual changes must be listed in the local result README.md.

Examples:

## Manual Changes

- Fixed a missing closing brace in JavaScript.
- Reconnected a broken UI control.
- Adjusted canvas scaling for high-DPI displays.

If there were no manual changes, write:

## Manual Changes

No manual changes.

Pull Request Guidelines

When opening a PR, include:

## Challenge

triple-pendulum

## Contributor

GitHub username: your-github-username

## Agent

opencode

## LLM

deepseek-v4-pro

## Result Path

results/your-github-username/opencode/deepseek-v4-pro/triple-pendulum/

## Prompt File

promptings/triple-pendulum.md

## Prompt Version

v1

## Generation Process

Generated in one shot.

## Manual Changes

No manual changes.

## Notes

Short subjective evaluation of the result.

Evaluation Criteria

Each challenge can define its own scoring system.

However, all results should generally be reviewed using these criteria:

Category	Description
Prompt compliance	Did the model follow the instructions?
Correctness	Does the generated output work?
Completeness	Did it implement all required features?
Code quality	Is the code readable and maintainable?
Visual quality	Is the result polished and impressive?
UX quality	Are controls and interactions usable?
Performance	Does it run smoothly?
Creativity	Did the model produce something memorable?
Manual fixes	Did it require human correction?

Suggested Review Format

Reviewers can use this optional scorecard:

Category	Max Score
Prompt compliance	15
Functional correctness	15
Feature completeness	15
Code quality	10
Visual quality	15
UX and controls	10
Performance	10
Creativity	10
Total	100

Running a Result

For single-file HTML challenges, open the index.html file directly in a browser.

Example:

results/cesarcasas/opencode/deepseek-v4-pro/triple-pendulum/index.html

No build step should be required.

No package manager should be required.

No server should be required.

Why Stock42 Supports This

Stock42 is an AI-First software company focused on building real products, developer tools, agentic platforms, and AI-assisted workflows.

We believe the developer community needs practical, transparent, reproducible examples to understand how different LLMs and agents perform in real-world development tasks.

This repository is part of that effort.

The objective is not to promote a single provider.

The objective is to help developers compare real outputs, understand trade-offs, and make better decisions.

License

MIT License.

Generated files are submitted for benchmarking, educational, and comparative purposes.

Final Objective

This repository should become a practical reference for comparing LLM coding capabilities across many types of challenges.

Each prompt is a test.

Each result is evidence. See all results in RESULTS.md Each PR helps the community understand what current AI coding tools can actually do.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Creative Coding Challenges

Purpose

Repository Structure

Promptings

Results Structure

Naming Rules

Current Challenges

Local Result README

Contribution Rules

Manual Fix Policy

Pull Request Guidelines

Evaluation Criteria

Suggested Review Format

Running a Result

Why Stock42 Supports This

License

Final Objective

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.playwright-mcp		.playwright-mcp
codex/gpt-5.5		codex/gpt-5.5
opencode/deepseek-v4-pro		opencode/deepseek-v4-pro
promptings		promptings
README.md		README.md
RESULTS.md		RESULTS.md

Folders and files

Latest commit

History

Repository files navigation

LLM Creative Coding Challenges

Purpose

Repository Structure

Promptings

Results Structure

Naming Rules

Current Challenges

Local Result README

Contribution Rules

Manual Fix Policy

Pull Request Guidelines

Evaluation Criteria

Suggested Review Format

Running a Result

Why Stock42 Supports This

License

Final Objective

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages