The paper introduces Uncertainty-Aware Policy Steering (UPS), a framework that calibrates Vision-Language Model (VLM) verifiers to adapt robot behaviors at deployment time. By integrating conformal prediction and a Bayesian intent model, UPS enables robots to decide whether to execute an action, ask for linguistic clarification, or request human intervention for policy retraining.
TL;DR
Deep generative policies (like Diffusion Policy) provide robots with diverse skills, but they don't always know when they are out of their depth. Uncertainty-aware Policy Steering (UPS) is a new framework that uses calibrated Vision-Language Models (VLMs) to act as a "brain" that monitors the "body." It doesn't just pick an action; it statistically decides when to act (high confidence), ask (ambiguous instructions), or learn (when the policy is physically incapable).
The Problem: The Overconfidence of "Silicon Brains"
Modern robotics often uses a "verify-and-steer" approach: a base policy generates multiple possible action samples, and a VLM selects the best one based on a text prompt. However, VLMs suffer from agreement bias and overconfidence.
- Semantic Ambiguity: If a user says "place the cup in a bin" (and there are two bins), an uncalibrated VLM might confidently pick the wrong one.
- Physical Incapability: If the robot's low-level policy simply doesn't know how to perform a task, the VLM might still force-pick the "least bad" (but still failing) sample instead of admitting defeat.
Methodology: UPS – The Triple Threat to Uncertainty
1. Interleaved Imagination & Narration
To let the VLM "see" the future, UPS uses a Latent World Model (Dreamer-v3). It interleaves action chunks from the policy with the world model's predictions to generate long-horizon "mental videos." These videos are سپس translated into textual narrations (e.g., "The robot places the cup in the left bin").

2. Bayesian Intent Factorization
Instead of asking a VLM "Is this action good?", UPS factorizes the problem:
- P(θ|L): What are the possible hidden intents behind this vague instruction? (e.g., "User might be left-handed or right-handed").
- P(y|ℓ, θ): Given an intent, how likely is this action to succeed? This prevents the VLM from collapsing onto a single, likely-wrong answer.
3. Conformal Prediction (CP) for Statistical Safety
UPS applies Conformal Prediction to create a "prediction set" of actions. If the set contains one action, the robot acts. If it contains multiple, it asks a clarifying question. If it contains a special "None of the Above" token, it triggers Residual Policy Learning to collect human help and update the model.
Experiments & Results: Efficiency in Action
The authors tested UPS on a Franka Panda robot in a "cup sorting" task.
- Higher Accuracy: In ambiguous tasks, UPS success rate reached 85%, a 30% jump over uncalibrated steering (Forewarn).
- Lower Human Cost: UPS only asks for physical help when it is truly incapable. Its intervention rate was nearly 3x lower than traditional DAgger methods.

Detailed Insight: The Value of "None of the Above"
The most striking contribution is the robot's ability to recognize incapability. By including a "none" option in the calibrated set, the system bridges the gap between high-level reasoning and low-level motor control. When the robot says, "I can't find a single way to do this," it initiates a residual learning loop that fine-tunes the policy without "catastrophic forgetting" of its original skills.
Conclusion & Future Outlook
UPS represents a shift from "black-box" execution to calibrated autonomy. By providing a mathematical framework for a robot to say "I'm not sure" or "I need more training," we move closer to robots that can safely operate in complex, unpredictable human environments.
Limitations: The system currently assumes the world model's "imaginations" are accurate. Future iterations will likely need to account for "imagination uncertainty" to handle even more chaotic real-world physics.
For more technical details, check out the project page: https://jessie-yuan.github.io/ups/
