WisPaper
WisPaper
学术搜索
学术问答
价格
TrueCite
CreativeGame: Moving Beyond Clones to Mechanic-Aware Evolution
总结
问题
方法
结果
要点
摘要

CreativeGame is a multi-agent system designed for iterative HTML5 game generation, shifting the focus from "single-shot" code generation to "mechanic-aware" evolutionary design. It utilizes a group of 10 specialized executable roles to transform high-level prompts into complex, playable games through a structured pipeline of planning, coding, and validation.

TL;DR

CreativeGame is a sophisticated multi-agent framework that treats game generation as a process of structural evolution. By decomposing the task into specialized agents and anchoring the reward system in deterministic code-level signals, it enables the generation of games that don't just "look" like games but possess novel, functional mechanics that improve over iterative versions (lineages).

The Problem: The "Generic Template" Trap

If you ask a standard LLM to "make a creative game," you usually get one of two things: a broken script or a boring clone of Pong or Flappy Bird. The reasons are three-fold:

  1. Subjectivity: LLMs are poor judges of their own "creativity," often giving everything a default 7/10 score.
  2. Brittleness: Code that looks correct often fails at runtime due to missing event listeners or broken game loops.
  3. Amnesia: The model doesn't remember what worked in "Version 1" when it's building "Version 2."

CreativeGame addresses this by elevating Mechanics—the local rule structures that define play—to first-class citizens in the generation pipeline.

Methodology: The Logic of Evolution

The system architecture is a precision-engineered pipeline involving 10 executable roles, ranging from "Skeleton" coders to "Refinement" agents.

1. Mechanic-Guided Planning

Instead of jumping straight to code, the system first retrieves physical "mechanic objects" from a 774-entry archive. It builds an explicit Mechanic Plan. If the plan says "Variable Gravity," the code generation stage is held accountable to that specific goal.

System Pipeline Overview Figure 1: The Multi-agent orchestration loop featuring the explicit Mechanic planning and refinement stages.

2. CreativeProxyReward: Killing the Hype

To avoid "Goodhart’s Law" (where the model creates "creative-sounding" descriptions for boring games), the authors use a weighted formula where 65% of the score comes from deterministic signals:

  • Mechanic Realization: Did you actually code the plan?
  • Structural Change: Did you change the core rules (), or just the colors ()?
  • Runtime Hard-Gate: If the code doesn't execute in a browser, its reward is slashed by 50%.

Reward Signal Weights Figure 2: The CreativeProxyReward weights prioritize structural and programmatic success over LLM-judged "creativity."

3. Lineage-Aware Memory

Borrowing from MemRL (Reinforcement Learning on Episodic Memory), CreativeGame stores experience in "Lineages." In a lineage of Plants vs. Zombies, the "v4" agent has access to the successes and failures of "v1-v3," allowing for a cohesive accumulation of design ideas.

Results: From Mimicry to Innovation

The most striking evidence of the system's success is found in its case studies. For instance, in a 4-generation evolution of a Flappy Bird style game:

  • v1: Simple obstacle dodging.
  • v4: Reinterpreted as "Route Writing," where perfect passes through gates actually rewrite the geometry of future obstacles.

Performance Metrics

  • Reliability: Success rate boosted to >98% via a Tier-1 Deep Static Analyzer and Tier-2 Browser Execution check.
  • Scale: The system managed a global archive of 774 unique mechanics.
  • Efficiency: The "Visual" generation stage consumes the most tokens (~34%), emphasizing that polish is computationally expensive, while the core mechanic planning is relatively lean (~8%).

Case Study Grid Figure 3: A live-execution grid showing the evolution of four distinct lineages from Round 1 to Round 4.

Critical Analysis & Takeaways

The brilliance of CreativeGame lies in its Formal Foundation. By defining a game mathematically as , the authors provide the LLM with a structural map of what to change.

Limitations: The system still relies on LLM backends (GPT/Kimi), meaning it is limited by the underlying reasoning capabilities of the base models. Furthermore, "Playability" scores are still model-proxies and haven't been fully validated against human "fun" metrics.

Future Outlook: This framework sets a blueprint for "Self-Evolving Content." Imagine a game that observes its own failure through the runtime validator and automatically "evolves" its next version to be more robust and structurally complex. CreativeGame isn't just generating code; it's simulating a digital game designer.

发现相似论文

试试这些示例

  • Search for recent papers on multi-agent software engineering frameworks that prioritize iterative refinement over single-shot generation.
  • What are the theoretical origins of MemRL (Memory Reinforcement Learning) and how does CreativeGame adapt these principles for creative tasks?
  • Explore research utilizing "LLM-as-a-judge" for procedural content generation and identify documented methods for overcoming the "score saturation" effect.
目录
CreativeGame: Moving Beyond Clones to Mechanic-Aware Evolution
1. TL;DR
2. The Problem: The "Generic Template" Trap
3. Methodology: The Logic of Evolution
3.1. 1. Mechanic-Guided Planning
3.2. 2. CreativeProxyReward: Killing the Hype
3.3. 3. Lineage-Aware Memory
4. Results: From Mimicry to Innovation
4.1. Performance Metrics
5. Critical Analysis & Takeaways