WisPaper
WisPaper
Scholar Search
Scholar QA
Pricing
TrueCite
[CVPR 2025] AnomalyAgent: Breaking the "One-Pass" Limit with Agentic Reinforcement Learning for Industrial Synthesis
Summary
Problem
Method
Results
Takeaways
Abstract

AnomalyAgent is a novel agentic industrial anomaly synthesis framework that formulates defect generation as an iterative, tool-augmented sequential decision-making task. By integrating Multimodal Large Language Models (MLLMs) with specialized tools and reinforcement learning, it achieves SOTA results on MVTec-AD, reaching 99.3% image-level and 74.2% pixel-level AP.

TL;DR

Industrial anomaly synthesis finally moves from "one-shot guessing" to "iterative reasoning." AnomalyAgent is the first multimodal agentic framework that treats defect generation as a sequential decision process. By using tool-augmented reinforcement learning (GRPO) and a closed-loop feedback mechanism, it achieves 99.3% image-level AP on MVTec-AD, setting a new benchmark for zero-shot industrial data synthesis.

Problem & Motivation: The "Semantic Drift" in Open-Loop Generation

In the industrial world, defect data is notoriously scarce. While Generative AI (Stable Diffusion, etc.) has provided a way to synthesize "pseudo-defects," existing methods suffer from a fundamental flaw: Open-Loop Generation.

The model takes a prompt, generates a pixel patch, and stops. There is no "self-reflection." If the generated scratch looks like a painting brush stroke or is placed in a physically impossible location (like a crack floating in the air), the model has no way to correct it. Prior SOTA zero-shot methods either rely on handcrafted perturbations (DRAEM) or static prompts, leading to semantic drift and low realism.

Methodology: The "Perception-Reflection-Action" Loop

The authors propose a radical shift: Replace the generator with an Agent. AnomalyAgent uses the Qwen3-VL-4B-Thinking backbone as a brain to coordinate five specialized tools.

1. The Toolset

  • PG (Prompt Generation): Translates abstract concepts into localized editing instructions.
  • IG (Image Generation): The "hand" that edits the pixels.
  • QE (Quality Evaluation): The "critic" that provides a score and textual feedback.
  • KR (Knowledge Retrieval): Fetches physical property descriptions (e.g., how brittle shell fractures look).
  • MG (Mask Generation): Produces the ground-truth segmentation label.

2. The Training Strategy

The most impressive part of this work is how they train the agent without manual labels. They use Reverse Synthesis: taking a real anomaly image, reconstructing its "normal" version, and then building an N-step trajectory of how to get from normal back to anomaly.

Overall Architecture

3. Reinforcement Learning (GRPO)

The agent is optimized via Group Relative Policy Optimization (GRPO). Instead of a single reward, it optimizes for three things:

  • Task Reward: Is the final image realistic?
  • Reflection Reward: Did the agent actually improve the image between turns?
  • Behavior Reward: Did the agent follow the correct tool sequence (e.g., PG → IG → QE)?

Experiments & Results: SOTA Performance

AnomalyAgent was tested on the industry-standard MVTec-AD and VisA datasets.

  • Localization Accuracy: It reached a mean 98.0 AUC and 74.2 AP at the pixel level, significantly better than GPT-4 or Gemini's image-editing capabilities which lack domain-specific reasoning.
  • Downstream Utility: Detectors trained on AnomalyAgent's data achieved 57% classification accuracy, a +12.3% jump over non-agentic baselines.

Experimental Results Contrast

Visualizations show that while previous models create artifacts or "unnatural" shapes, AnomalyAgent accounts for surface textures and material properties (physics-informed synthesis).

Visual Comparison

Critical Analysis & Conclusion

Takeaway

AnomalyAgent proves that LLM reasoning isn't just for text or code; it can be "grounded" into pixel-space editing through tool-use. The iterative refinement loop acts as an Inductive Bias for realism that single-step models simply cannot match.

Limitations

Despite its success, the framework is computationally heavy. Each "good sample" requires multiple model calls, taking around 100+ seconds. While the cost is lower than human annotation, real-time "on-the-fly" synthesis remains a challenge.

Future Outlook

This "Agentic Synthesis" paradigm could be the "next big thing" for 3D asset generation or medical data simulation, where "correctness" and "feedback" are more important than sheer generation speed.

Find Similar Papers

Try Our Examples

  • Search for recent papers that utilize Reinforcement Learning with Verifiable Rewards (RLVR) or Group Relative Policy Optimization (GRPO) specifically for image synthesis or editing tasks.
  • What are the seminal works on training MLLMs as autonomous agents for scientific or technical tool manipulation, and how does AnomalyAgent's trajectory reconstruction compare to their approaches?
  • Explore research applying agentic reasoning frameworks to other data-scarce domains like medical imaging or autonomous driving scene synthesis.
Contents
[CVPR 2025] AnomalyAgent: Breaking the "One-Pass" Limit with Agentic Reinforcement Learning for Industrial Synthesis
1. TL;DR
2. Problem & Motivation: The "Semantic Drift" in Open-Loop Generation
3. Methodology: The "Perception-Reflection-Action" Loop
3.1. 1. The Toolset
3.2. 2. The Training Strategy
3.3. 3. Reinforcement Learning (GRPO)
4. Experiments & Results: SOTA Performance
5. Critical Analysis & Conclusion
5.1. Takeaway
5.2. Limitations
5.3. Future Outlook