WisPaper
WisPaper
学术搜索
学术问答
价格
TrueCite
[ICRA 2024] IGV-RRT: Solving the "Stale Map" Problem in Active Object Search
总结
问题
方法
结果
要点
摘要

This paper introduces IGV-RRT, a probabilistic planning framework for Object Goal Navigation (ObjectNav) in temporally changing indoor environments. It combines a 3D Scene Graph-based Information Gain Map (IGM) for global guidance with an online VLM Score Map (VLM-SM) to achieve State-of-the-Art search efficiency and success rates by correcting stale historical priors with real-time semantic evidence.

TL;DR

Navigating indoor environments is hard when humans move the furniture. IGV-RRT solves this by fusing "what we remember" (3D Scene Graphs) with "what we see right now" (VLM-based semantic scores). By integrating these into a real-time RRT planner, robots can now efficiently find relocated objects, achieving a +24.7% improvement in success rates over prior semantic mapping baselines.

Problem & Motivation: The "Static Map" Trap

Most robotic navigation systems assume the world is a museum—static and unchanging. They build a 3D Scene Graph (3DSG), map out where the "couch" and "table" are, and use these as anchors to find smaller objects.

However, in real homes, things move. If a robot's prior knowledge says the "Remote Control" is usually near the "Sofa," but the Sofa has been moved to another room, the robot gets trapped in a loop of searching an empty space. Current Vision-Language Model (VLM) approaches try to fix this by looking at every frame, but they lack global "intuition" and often waste time in redundant exploration.

Methodology: The Power of Dual-Layer Mapping

The CORE innovation of IGV-RRT is the Prior-Real-Time Observation Fusion. It doesn't choose between memory and sight; it weighs them dynamically.

1. The Information Gain Map (IGM) - The Global Memory

The robot builds a 3DSG using YOLOv7 and ConceptNet. This creates a "probability field" (using a Gaussian Mixture Model) of where a target should be based on commonsense (e.g., "Mugs are usually near Coffee Machines").

2. The VLM Score Map (VLM-SM) - The Real-Time Corroborator

As the robot moves, it uses BLIP-2 to evaluate the current view. The authors use a multi-prompt strategy (asking the VLM about the object name, its context, and its room type) to create a high-contrast semantic map. If the robot sees a high semantic score in a place the "Memory" didn't expect, the map updates to reflect the new reality.

System Architecture Fig 1: Overview of the IGV-RRT pipeline showing the fusion of IGM (Prior) and VLM-SM (Real-time).

3. IGV-RRT Planning: Smart Tree Expansion

The planner evaluates candidate nodes $v$ using a joint utility function: $$U_{final}(v) = \lambda_d \cdot (1 - D(v)) + \mathbb{I}(v otin \mathcal{M}_{exp}) \cdot [ \lambda_e \cdot E(v) + \lambda_s \cdot S(v) ]$$

  • $E(v)$: Information gain from the prior (Go where we think the object is).
  • $S(v)$: VLM semantic support (Go where we actually see evidence).
  • $\mathbb{I}(v otin \mathcal{M}_{exp})$: The "Explored-Region Gating"—a crucial mechanism that stops the robot from revisiting the same spot twice.

Experiments: Proving the Resilience

The team tested IGV-RRT in the HM3D (Habitat-Matterport 3D) simulator and on a Wheeltec R550 physical robot.

SOTA Comparison

In environments where objects were moved after the initial map was built, IGV-RRT crushed the VLFM baseline:

  • Success Rate (SR): 42.9% vs. 34.4%
  • Path Efficiency (SPL): 26.3% vs. 16.7%

Trajectory Comparison Fig 2: Trajectory comparison. IGV-RRT (Red) uses VLM evidence to correct pathing early, while the baseline (Green) wanders aimlessly.

Ablation Insights

The study proved that VLM-SM (Semantic Score) and Explored-Region Gating (Revisit Suppression) are synergistic. Without the gating, the robot frequently gets stuck in "semantic traps," repeatedly checking a high-scoring but empty area.

Deep Insight & Conclusion

IGV-RRT succeeds because it treats historical data as a soft bias rather than a hard constraint. By mathematically weighing the entropy of the prior (IGP) against the confidence of the real-time VLM, it achieves a "Bayesian-like" balance in motion planning.

Limitations: The current IGM is still "frozen" once constructed. While the robot can ignore it, it can't yet "rewrite" its long-term memory to say "The Sofa is now in Room B."

Future Outlook: Transitioning this into Long-term Autonomy (LTA), where the robot maintains an evolving 3D Scene Graph over weeks or months, will be the next frontier in embodied AI.


Paper Reference: "IGV-RRT: Prior-Real-Time Observation Fusion for Active Object Search in Changing Environments"

发现相似论文

试试这些示例

  • Search for recent papers on Object Goal Navigation that specifically address temporal scene evolution and dynamic environment mapping beyond static datasets.
  • Which study first integrated 3D Scene Graphs (3DSG) with RRT-based path planning, and how does the IGV-RRT utility function specifically extend that original formulation?
  • Explore how Vision-Language Model (VLM) score fusion techniques from this paper could be applied to multi-robot collaborative search tasks in large-scale disaster response scenarios.
目录
[ICRA 2024] IGV-RRT: Solving the "Stale Map" Problem in Active Object Search
1. TL;DR
2. Problem & Motivation: The "Static Map" Trap
3. Methodology: The Power of Dual-Layer Mapping
3.1. 1. The Information Gain Map (IGM) - The Global Memory
3.2. 2. The VLM Score Map (VLM-SM) - The Real-Time Corroborator
3.3. 3. IGV-RRT Planning: Smart Tree Expansion
4. Experiments: Proving the Resilience
4.1. SOTA Comparison
4.2. Ablation Insights
5. Deep Insight & Conclusion