WisPaper
WisPaper
学术搜索
学术问答
价格
TrueCite
[RSS 2025] When to Act, Ask, or Learn: Taming Overconfident VLMs for Robust Robot Steering
总结
问题
方法
结果
要点
摘要

The paper introduces Uncertainty-Aware Policy Steering (UPS), a framework that calibrates Vision-Language Model (VLM) verifiers to adapt robot behaviors at deployment time. By integrating conformal prediction and a Bayesian intent model, UPS enables robots to decide whether to execute an action, ask for linguistic clarification, or request human intervention for policy retraining.

TL;DR

Deep generative policies (like Diffusion Policy) provide robots with diverse skills, but they don't always know when they are out of their depth. Uncertainty-aware Policy Steering (UPS) is a new framework that uses calibrated Vision-Language Models (VLMs) to act as a "brain" that monitors the "body." It doesn't just pick an action; it statistically decides when to act (high confidence), ask (ambiguous instructions), or learn (when the policy is physically incapable).

The Problem: The Overconfidence of "Silicon Brains"

Modern robotics often uses a "verify-and-steer" approach: a base policy generates multiple possible action samples, and a VLM selects the best one based on a text prompt. However, VLMs suffer from agreement bias and overconfidence.

  1. Semantic Ambiguity: If a user says "place the cup in a bin" (and there are two bins), an uncalibrated VLM might confidently pick the wrong one.
  2. Physical Incapability: If the robot's low-level policy simply doesn't know how to perform a task, the VLM might still force-pick the "least bad" (but still failing) sample instead of admitting defeat.

Methodology: UPS – The Triple Threat to Uncertainty

1. Interleaved Imagination & Narration

To let the VLM "see" the future, UPS uses a Latent World Model (Dreamer-v3). It interleaves action chunks from the policy with the world model's predictions to generate long-horizon "mental videos." These videos are سپس translated into textual narrations (e.g., "The robot places the cup in the left bin").

Outcome Prediction & Narration

2. Bayesian Intent Factorization

Instead of asking a VLM "Is this action good?", UPS factorizes the problem:

  • P(θ|L): What are the possible hidden intents behind this vague instruction? (e.g., "User might be left-handed or right-handed").
  • P(y|ℓ, θ): Given an intent, how likely is this action to succeed? This prevents the VLM from collapsing onto a single, likely-wrong answer.

3. Conformal Prediction (CP) for Statistical Safety

UPS applies Conformal Prediction to create a "prediction set" of actions. If the set contains one action, the robot acts. If it contains multiple, it asks a clarifying question. If it contains a special "None of the Above" token, it triggers Residual Policy Learning to collect human help and update the model.

Experiments & Results: Efficiency in Action

The authors tested UPS on a Franka Panda robot in a "cup sorting" task.

  • Higher Accuracy: In ambiguous tasks, UPS success rate reached 85%, a 30% jump over uncalibrated steering (Forewarn).
  • Lower Human Cost: UPS only asks for physical help when it is truly incapable. Its intervention rate was nearly 3x lower than traditional DAgger methods.

Uncertainty Quantification Results

Detailed Insight: The Value of "None of the Above"

The most striking contribution is the robot's ability to recognize incapability. By including a "none" option in the calibrated set, the system bridges the gap between high-level reasoning and low-level motor control. When the robot says, "I can't find a single way to do this," it initiates a residual learning loop that fine-tunes the policy without "catastrophic forgetting" of its original skills.

Conclusion & Future Outlook

UPS represents a shift from "black-box" execution to calibrated autonomy. By providing a mathematical framework for a robot to say "I'm not sure" or "I need more training," we move closer to robots that can safely operate in complex, unpredictable human environments.

Limitations: The system currently assumes the world model's "imaginations" are accurate. Future iterations will likely need to account for "imagination uncertainty" to handle even more chaotic real-world physics.


For more technical details, check out the project page: https://jessie-yuan.github.io/ups/

发现相似论文

试试这些示例

  • Search for recent research on "test-time compute" or "policy steering" in robotics that utilizes Vision-Language Models as verifiers beyond the Forewarn or RoboMonkey frameworks.
  • Which paper first proposed the application of Conformal Prediction for Large Language Model (LLM) planner calibration, such as KnowNo, and how does the UPS score function improve upon its non-conformity logic?
  • Find studies that integrate Latent World Models (like Dreamer-v3) with Residual Policy Learning for continual robot skill acquisition in multi-modal environments.
目录
[RSS 2025] When to Act, Ask, or Learn: Taming Overconfident VLMs for Robust Robot Steering
1. TL;DR
2. The Problem: The Overconfidence of "Silicon Brains"
3. Methodology: UPS – The Triple Threat to Uncertainty
3.1. 1. Interleaved Imagination & Narration
3.2. 2. Bayesian Intent Factorization
3.3. 3. Conformal Prediction (CP) for Statistical Safety
4. Experiments & Results: Efficiency in Action
5. Detailed Insight: The Value of "None of the Above"
6. Conclusion & Future Outlook