CreativeGame is a multi-agent system designed for iterative HTML5 game generation, shifting the focus from "single-shot" code generation to "mechanic-aware" evolutionary design. It utilizes a group of 10 specialized executable roles to transform high-level prompts into complex, playable games through a structured pipeline of planning, coding, and validation.
TL;DR
CreativeGame is a sophisticated multi-agent framework that treats game generation as a process of structural evolution. By decomposing the task into specialized agents and anchoring the reward system in deterministic code-level signals, it enables the generation of games that don't just "look" like games but possess novel, functional mechanics that improve over iterative versions (lineages).
The Problem: The "Generic Template" Trap
If you ask a standard LLM to "make a creative game," you usually get one of two things: a broken script or a boring clone of Pong or Flappy Bird. The reasons are three-fold:
- Subjectivity: LLMs are poor judges of their own "creativity," often giving everything a default 7/10 score.
- Brittleness: Code that looks correct often fails at runtime due to missing event listeners or broken game loops.
- Amnesia: The model doesn't remember what worked in "Version 1" when it's building "Version 2."
CreativeGame addresses this by elevating Mechanics—the local rule structures that define play—to first-class citizens in the generation pipeline.
Methodology: The Logic of Evolution
The system architecture is a precision-engineered pipeline involving 10 executable roles, ranging from "Skeleton" coders to "Refinement" agents.
1. Mechanic-Guided Planning
Instead of jumping straight to code, the system first retrieves physical "mechanic objects" from a 774-entry archive. It builds an explicit Mechanic Plan. If the plan says "Variable Gravity," the code generation stage is held accountable to that specific goal.
Figure 1: The Multi-agent orchestration loop featuring the explicit Mechanic planning and refinement stages.
2. CreativeProxyReward: Killing the Hype
To avoid "Goodhart’s Law" (where the model creates "creative-sounding" descriptions for boring games), the authors use a weighted formula where 65% of the score comes from deterministic signals:
- Mechanic Realization: Did you actually code the plan?
- Structural Change: Did you change the core rules (), or just the colors ()?
- Runtime Hard-Gate: If the code doesn't execute in a browser, its reward is slashed by 50%.
Figure 2: The CreativeProxyReward weights prioritize structural and programmatic success over LLM-judged "creativity."
3. Lineage-Aware Memory
Borrowing from MemRL (Reinforcement Learning on Episodic Memory), CreativeGame stores experience in "Lineages." In a lineage of Plants vs. Zombies, the "v4" agent has access to the successes and failures of "v1-v3," allowing for a cohesive accumulation of design ideas.
Results: From Mimicry to Innovation
The most striking evidence of the system's success is found in its case studies. For instance, in a 4-generation evolution of a Flappy Bird style game:
- v1: Simple obstacle dodging.
- v4: Reinterpreted as "Route Writing," where perfect passes through gates actually rewrite the geometry of future obstacles.
Performance Metrics
- Reliability: Success rate boosted to >98% via a Tier-1 Deep Static Analyzer and Tier-2 Browser Execution check.
- Scale: The system managed a global archive of 774 unique mechanics.
- Efficiency: The "Visual" generation stage consumes the most tokens (~34%), emphasizing that polish is computationally expensive, while the core mechanic planning is relatively lean (~8%).
Figure 3: A live-execution grid showing the evolution of four distinct lineages from Round 1 to Round 4.
Critical Analysis & Takeaways
The brilliance of CreativeGame lies in its Formal Foundation. By defining a game mathematically as , the authors provide the LLM with a structural map of what to change.
Limitations: The system still relies on LLM backends (GPT/Kimi), meaning it is limited by the underlying reasoning capabilities of the base models. Furthermore, "Playability" scores are still model-proxies and haven't been fully validated against human "fun" metrics.
Future Outlook: This framework sets a blueprint for "Self-Evolving Content." Imagine a game that observes its own failure through the runtime validator and automatically "evolves" its next version to be more robust and structurally complex. CreativeGame isn't just generating code; it's simulating a digital game designer.
