From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents

WisPaper

Scholar Search

Scholar QA

Pricing

TrueCite

Workspace

Home

Blog

From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents

From Static Templates to Dynamic Runtime Graphs: Evolution of the Agentic Computation Graph (ACG)

Summary

Problem

Method

Results

Takeaways

Abstract

This survey introduces the concept of Agentic Computation Graphs (ACGs) to unify various LLM-based systems into a structural optimization framework. It categorizes methods like AFlow and ADAS into a taxonomy of Static and Dynamic workflow optimizations, establishing new SOTA standards for how agentic structures should be searched, generated, and evaluated.

TL;DR

The "Wild West" of LLM agent design is finally getting a map. This seminal survey moves beyond simple prompt engineering to treat agent workflows as Agentic Computation Graphs (ACGs). By distinguishing between static templates (optimized offline) and dynamic graphs (generated at runtime), the authors provide a rigorous framework for optimizing how LLMs, tools, and verifiers collaborate.

Core Insight: The secret to SOTA performance isn't just a better LLM; it's a better structure—one that can prune itself for easy tasks and rewrite its own logic for hard ones.

The Structural Pain Point: Why Prompts Aren't Enough

Most developers treat agentic workflows as fixed code scaffolds. You write a "Plan -> Execute -> Verify" loop and hope for the best. However, as the paper points out, this leads to "brittle structural assumptions."

If your retrieval fails and your template doesn't have a "Retry" node, a better prompt won't save you. If your task is simple but your workflow is a 10-agent powerhouse, you are wasting tokens. The industry has lacked a way to optimize these topologies—until now.

Methodology: The ACG Hierarchy

The paper introduces a clean abstraction: Template $\rightarrow$ Realized Graph $\rightarrow$ Trace.

ACG Template ($\bar{\mathcal{G}}$): The reusable blueprint.
Realized Graph ($G_{run}$): The specific structure deployed for a single query (might be a subset of the template).
Execution Trace ($ au$): What actually happened (the logs).

The Taxonomy of Plasticity

The authors organize the field into two major hemispheres:

Static Optimization: You search for the "perfect" reusable template (like AFlow or DSPy) before the user even types a query.
Dynamic Optimization: The agent creates or edits its own workflow on the fly.
- Select/Prune: Activating only needed agents (e.g., Adaptive Graph Pruning).
- Pre-execution: Drawing a map before starting (e.g., Assemble Your Crew).
- In-execution: Changing the engine while the car is moving (e.g., MetaGen).

Overall Framework of Workflow Optimization

Key Experimental Insights: Quality vs. Cost

The paper defines the "Quality-Cost" view through a schematic formula: $$\max \mathbb{E} [ R( au; x) - \lambda C( au) ]$$ where $R$ is task success and $C$ is cost (tokens, time, money).

When to use what?

Static is enough when the task is repetitive and the evaluator is reliable (e.g., Code Gen).
Selection beats Generation when the tasks vary in difficulty but not in fundamental logic.
Editing is unavoidable in high-uncertainty environments where you only learn what to do next after a tool fails.

Comparison of Leading Methods

Critical Analysis: The Evaluation Gap

The survey's most aggressive (and necessary) contribution is the Minimum Reporting Protocol. The authors argue that reporting "Accuracy" is no longer enough. If a paper claims a 5% gain, we need to know:

Graph-level metrics: How many nodes? How much communication volume?
Structural variance: Did the agent use the same graph every time, or did it adapt?
Ablation: Was the gain from the "smart" new node or just more compute?

Takeaway for the Future

We are moving toward "Architectural Plasticity." Future LLM systems won't be static bots; they will be fluid "Computation Graphs" that reshape themselves for every dollar spent and every error encountered.

Key Research Front: Structural Credit Assignment. How do we know exactly which edge in a 50-node graph caused the failure? Solving this will be the "Backpropagation" moment for agentic workflows.

Note: This survey identifies 77 core papers and is a must-read for anyone building production-grade LLM Orchestrators.

Find Similar Papers

Try Our Examples

Find the most recent papers published after this survey that implement "In-execution editing" in LLM agent workflows for multi-modal tasks.
Which original studies first established the use of Monte Carlo Tree Search (MCTS) for searching architectural topologies in neural networks, and how does AFlow adapt these for LLM operators?
Explore research applying Agentic Computation Graph (ACG) optimization specifically to autonomous scientific discovery or robotic laboratory automation.

Contents

From Static Templates to Dynamic Runtime Graphs: Evolution of the Agentic Computation Graph (ACG)

1. TL;DR

2. The Structural Pain Point: Why Prompts Aren't Enough

3. Methodology: The ACG Hierarchy

3.1. The Taxonomy of Plasticity

4. Key Experimental Insights: Quality vs. Cost

4.1. When to use what?

5. Critical Analysis: The Evaluation Gap

6. Takeaway for the Future