WisPaper
WisPaper
学术搜索
学术问答
论文订阅
价格
TrueCite
[CVPR 2026] TRAP: Hijacking VLA CoT-Reasoning via Adversarial Patches
总结
问题
方法
结果
要点
摘要

TRAP (CoT-Reasoning Adversarial Patch) is the first targeted adversarial attack framework against Vision-Language-Action (VLA) models equipped with Chain-of-Thought (CoT) reasoning. By placing a physically printable adversarial patch in the environment, the attack hijacks the model's intermediate reasoning steps to induce malicious robot behaviors (e.g., delivering a knife instead of an apple) with high success rates across mainstream VLA architectures like MolmoACT and GraspVLA.

TL;DR

Researchers have discovered a critical vulnerability in the latest "Reasoning VLAs." By placing a simple printed coaster (an adversarial patch) on a table, an attacker can hijack a robot's Chain-of-Thought (CoT). Even if you tell the robot to "pick up the apple," the adversarial patch can trick its internal "brain" into thinking the plan is to "pick up the knife"—resulting in a successful, yet dangerous, execution of the wrong task.

The "Competition Mechanism": Why CoT is a Weak Point

In modern Vision-Language-Action (VLA) models, CoT is designed to act as a bridge, breaking down complex instructions into intermediate sub-goals (like bounding boxes or textual plans). However, the authors of TRAP identified a Competition Mechanism: when the user's text instruction and the model's internal CoT conflict, the CoT often wins.

Through preliminary analysis (Instruction Masking and Cross-Sample Shuffling), the team found that CoT tokens are not just "flavor text"—they are vital drivers of the final action. This discovery transformed CoT from a safety feature into a primary attack vector.

Methodology: The TRAP Framework

TRAP (CoT-Reasoning Adversarial Patch) doesn't just add noise to an image; it optimizes a specific visual pattern to "rewrite" the model's intent.

1. Joint Optimization

Instead of just attacking the final motor commands, TRAP uses a dual-loss objective:

  • CoT Hijacking Loss ($\mathcal{L}_{cot}$): Aligns the generated reasoning tokens with the attacker's target sequence (e.g., "moving toward the knife").
  • Action Loss ($\mathcal{L}_{action}$): Ensures the robot's physical movements are consistent with the hijacked plan to prevent "mode collapse" or erratic behavior.

2. Bridging the Sim-to-Real Gap

To make the attack work on a real printer and paper, the authors used:

  • Homography Transformation: Modeling how the patch looks from different camera angles.
  • Color Calibration: Using an MLP to map digital colors to the specific CMYK/RGB gamut of a physical printer.

Model Architecture and Attack Flow

Experimental Battleground: SOTA Benchmarks

The team tested TRAP against three major VLA architectures:

  1. MolmoACT: Integrated architecture using discrete tokens.
  2. GraspVLA: Integrated architecture focused on continuous grasp poses.
  3. InstructVLA: Hierarchical architecture using textual sub-tasks.

| Method | MolmoACT ASR | InstructVLA ASR | GraspVLA ASR | Average ASR | | :--- | :--- | :--- | :--- | :--- | | Random Noise | 0.97% | 3.39% | 0.32% | 1.56% | | Action-Only | 9.68% | 6.77% | 0.00% | 5.48% | | TRAP (Ours) | 48.06% | 33.71% | 75.84% | 52.54% |

Qualitative Results of Hijacking

The results confirm that targeting the CoT is roughly 10x more effective than traditional end-to-end adversarial attacks.

Real-World Hazardous Redirection

The most chilling part of the study involved a physical Franka Panda robot. Under normal conditions, the robot would pick up a carrot as instructed. With the TRAP patch present, the robot's internal CoT shifted its attention from the carrot to a nearby knife, successfully completing the "malicious" redirection in 33.3% of full-horizon trials.

Real-World Setup

Critical Insights & Future Outlook

The Takeaway: Explicit reasoning in VLAs is a "leaked" version of the model's internal state. While it makes the robot's behavior more explainable to humans, it also provides a clear "handle" for adversaries to grab and steer.

Limitations: Currently, TRAP is visible to the human eye. Future iterations of this research could focus on "stealthy" patches that look like natural textures (e.g., a wood-grain table or a brand logo) but still harbor the same hijacking capability.

Conclusion: As we move toward general-purpose robot assistants, we must treat CoT tokens as safety-critical data. The industry needs "reasoning-checkers" that act as a watchdog, ensuring the robot's internal plan never deviates from the human's original command.

发现相似论文

试试这些示例

  • Find recent papers investigating the alignment and safety of Chain-of-Thought reasoning in Multimodal Large Language Models (MLLMs).
  • Which paper first established the competition mechanism between different modalities in VLA models, and how does TRAP's findings on CoT dominance build upon it?
  • Search for research exploring the use of adversarial patches for hijacking high-level planning in autonomous systems beyond robotic manipulation, such as autonomous driving.
目录
[CVPR 2026] TRAP: Hijacking VLA CoT-Reasoning via Adversarial Patches
1. TL;DR
2. The "Competition Mechanism": Why CoT is a Weak Point
3. Methodology: The TRAP Framework
3.1. 1. Joint Optimization
3.2. 2. Bridging the Sim-to-Real Gap
4. Experimental Battleground: SOTA Benchmarks
5. Real-World Hazardous Redirection
6. Critical Insights & Future Outlook