TRAP (CoT-Reasoning Adversarial Patch) is the first targeted adversarial attack framework against Vision-Language-Action (VLA) models equipped with Chain-of-Thought (CoT) reasoning. By placing a physically printable adversarial patch in the environment, the attack hijacks the model's intermediate reasoning steps to induce malicious robot behaviors (e.g., delivering a knife instead of an apple) with high success rates across mainstream VLA architectures like MolmoACT and GraspVLA.
TL;DR
Researchers have discovered a critical vulnerability in the latest "Reasoning VLAs." By placing a simple printed coaster (an adversarial patch) on a table, an attacker can hijack a robot's Chain-of-Thought (CoT). Even if you tell the robot to "pick up the apple," the adversarial patch can trick its internal "brain" into thinking the plan is to "pick up the knife"—resulting in a successful, yet dangerous, execution of the wrong task.
The "Competition Mechanism": Why CoT is a Weak Point
In modern Vision-Language-Action (VLA) models, CoT is designed to act as a bridge, breaking down complex instructions into intermediate sub-goals (like bounding boxes or textual plans). However, the authors of TRAP identified a Competition Mechanism: when the user's text instruction and the model's internal CoT conflict, the CoT often wins.
Through preliminary analysis (Instruction Masking and Cross-Sample Shuffling), the team found that CoT tokens are not just "flavor text"—they are vital drivers of the final action. This discovery transformed CoT from a safety feature into a primary attack vector.
Methodology: The TRAP Framework
TRAP (CoT-Reasoning Adversarial Patch) doesn't just add noise to an image; it optimizes a specific visual pattern to "rewrite" the model's intent.
1. Joint Optimization
Instead of just attacking the final motor commands, TRAP uses a dual-loss objective:
- CoT Hijacking Loss ($\mathcal{L}_{cot}$): Aligns the generated reasoning tokens with the attacker's target sequence (e.g., "moving toward the knife").
- Action Loss ($\mathcal{L}_{action}$): Ensures the robot's physical movements are consistent with the hijacked plan to prevent "mode collapse" or erratic behavior.
2. Bridging the Sim-to-Real Gap
To make the attack work on a real printer and paper, the authors used:
- Homography Transformation: Modeling how the patch looks from different camera angles.
- Color Calibration: Using an MLP to map digital colors to the specific CMYK/RGB gamut of a physical printer.

Experimental Battleground: SOTA Benchmarks
The team tested TRAP against three major VLA architectures:
- MolmoACT: Integrated architecture using discrete tokens.
- GraspVLA: Integrated architecture focused on continuous grasp poses.
- InstructVLA: Hierarchical architecture using textual sub-tasks.
| Method | MolmoACT ASR | InstructVLA ASR | GraspVLA ASR | Average ASR | | :--- | :--- | :--- | :--- | :--- | | Random Noise | 0.97% | 3.39% | 0.32% | 1.56% | | Action-Only | 9.68% | 6.77% | 0.00% | 5.48% | | TRAP (Ours) | 48.06% | 33.71% | 75.84% | 52.54% |

The results confirm that targeting the CoT is roughly 10x more effective than traditional end-to-end adversarial attacks.
Real-World Hazardous Redirection
The most chilling part of the study involved a physical Franka Panda robot. Under normal conditions, the robot would pick up a carrot as instructed. With the TRAP patch present, the robot's internal CoT shifted its attention from the carrot to a nearby knife, successfully completing the "malicious" redirection in 33.3% of full-horizon trials.

Critical Insights & Future Outlook
The Takeaway: Explicit reasoning in VLAs is a "leaked" version of the model's internal state. While it makes the robot's behavior more explainable to humans, it also provides a clear "handle" for adversaries to grab and steer.
Limitations: Currently, TRAP is visible to the human eye. Future iterations of this research could focus on "stealthy" patches that look like natural textures (e.g., a wood-grain table or a brand logo) but still harbor the same hijacking capability.
Conclusion: As we move toward general-purpose robot assistants, we must treat CoT tokens as safety-critical data. The industry needs "reasoning-checkers" that act as a watchdog, ensuring the robot's internal plan never deviates from the human's original command.
