WisPaper
WisPaper
学术搜索
学术问答
论文订阅
价格
TrueCite
[CVPR 2025] UniQueR: Breaking the 2.5D Barrier with Unified Query-based 3D Reconstruction
总结
问题
方法
结果
要点
摘要

UniQueR is a unified, query-based feedforward framework for 3D reconstruction from unposed images. By representing scenes as sparse 3D learnable queries that spawn 3D Gaussians, it achieves state-of-the-art (SOTA) rendering quality on datasets like Mip-NeRF 360 and VR-NeRF while remaining efficient.

TL;DR

UniQueR is a breakthrough in feedforward 3D reconstruction that moves away from view-dependent depth maps. By using sparse 3D queries to represent a scene, it can reconstruct occluded regions and "hallucinate" missing geometry in a single forward pass. It achieves SOTA results on Mip-NeRF 360 using 15x fewer primitives and running 2.4x faster than previous methods like AnySplat.

Context: The Limitations of "View-Anchored" 3D

Recent "Splatting" models (like AnySplat or MVSplat) have moved the needle for real-time reconstruction. However, they almost all share a fundamental flaw: they are 2.5D. They predict Gaussians for every pixel in the input images.

If a surface is hidden behind a chair or a wall in the input camera, these models simply don't "see" it and cannot place geometry there. This leads to massive "holes" when you try to render the scene from a new perspective.

The Core Insight: Sparse 3D Queries

UniQueR asks: What if we don't tie geometry to pixels?

Instead of predicting a Gaussian for every pixel, UniQueR uses a fixed set of 3D Queries. These queries are spatial anchors that live in global 3D space.

  1. Hybrid Initialization: Half the queries are seeded from initial point-cloud guesses, while the other half are spread uniformly to "scout" for occluded geometry.
  2. Decoupled Cross-Attention: To keep computation low, queries "look" at image features via cross-attention rather than doing full self-attention across every image patch.
  3. Gaussian Spawning: Each query acts as a "seed" that sprouts 64 small 3D Gaussians to capture fine details.

UniQueR Architecture The UniQueR pipeline: From unposed images to global 3D queries and Gaussian Splatting.

Why It Works: Novel-View Supervision

The secret sauce is the training strategy. Unlike prior work that only supervises the views the model sees, UniQueR is supervised on held-out novel views.

If the model only reconstructs what's visible in the input, it gets "penalized" for the holes seen in the novel views. This forces the 3D queries to learn geometric priors—effectively teaching the model to "fill in the blanks" for occluded areas.

Benchmarks & Efficiency

UniQueR isn't just more complete; it’s leaner. Because it uses sparse queries rather than one-Gaussian-per-pixel, the memory footprint is drastically reduced.

Performance Comparison

Key Wins:

  • Inference Speed: ~0.2s for 3 views (over 2x faster than competitors).
  • Memory: Only 11GB VRAM compared to 18GB for AnySplat.
  • Geometry: The depth maps (Abs Rel error) are significantly cleaner, proving that the model actually understands 3D structure rather than just "painting" colors.

Qualitative Comparison Visual evidence of occlusion handling: UniQueR (right) fills in areas where 2.5D methods (left) leave empty voids.

Critical Analysis

While UniQueR is a massive step forward for static scene reconstruction, it currently assumes the world is still. The next frontier will be extending these 3D spatial queries to handle temporal dynamics (moving people or cars).

Furthermore, the reliance on DINOv2 features means it has strong semantic priors, but it might struggle with extremely out-of-distribution textures not found in its training sets.

Conclusion

UniQueR elegantly solves the "occlusion problem" in feedforward 3D reconstruction by decoupling representation from the camera frustum. It proves that sparse queries are not just for object detection (DETR)—they might be the most efficient way to represent the entire 3D world.

发现相似论文

试试这些示例

  • Search for recent papers in 2024 or 2025 that use query-based transformers for scene-level 3D reconstruction instead of object-centric tasks.
  • How does the "decoupled cross-attention" in UniQueR compare to the memory-efficient attention mechanisms used in Perceiver IO or Flamingo?
  • Find research that applies feedforward 3D Gaussian Splatting initialization to improve the convergence speed of SLAM or large-scale SfM pipelines.
目录
[CVPR 2025] UniQueR: Breaking the 2.5D Barrier with Unified Query-based 3D Reconstruction
1. TL;DR
2. Context: The Limitations of "View-Anchored" 3D
3. The Core Insight: Sparse 3D Queries
4. Why It Works: Novel-View Supervision
5. Benchmarks & Efficiency
6. Critical Analysis
7. Conclusion