WisPaper
WisPaper
学术搜索
学术问答
价格
TrueCite
[CVPR 2025] SR3R: Rethinking 3D Super-Resolution via Feed-Forward Gaussian Splatting
总结
问题
方法
结果
要点
摘要

This paper introduces SR3R, a novel feed-forward framework for 3D Super-Resolution (3DSR) built on Gaussian Splatting (3DGS). It reformulates 3DSR as a direct mapping from sparse low-resolution (LR) views to high-resolution (HR) 3DGS representations, achieving SOTA performance and strong zero-shot generalization without per-scene optimization.

Executive Summary

TL;DR: SR3R (Super-Resolution 3D Reconstruction) transforms the 3D super-resolution task from a slow, per-scene optimization process into a lightning-fast feed-forward mapping. By training on large-scale datasets, it learns "3D-native" high-frequency textures that generic 2D super-resolution models miss, enabling high-fidelity 3D reconstruction from as few as two low-resolution (LR) views.

Background: Most 3D Gaussian Splatting (3DGS) models require high-resolution, dense inputs. Current 3DSR attempts to fix this by using 2D upsamplers to "cheat" via pseudo-labels, but this creates view inconsistency and lacks scalability. SR3R is a paradigm shift toward generalized, data-driven 3D refinement.

Problem & Motivation: The "2D Prior" Ceiling

Why can't we just use a 2D Super-Resolution (2DSR) model and then run 3DGS?

  1. View Inconsistency: 2D models process each image independently; textures "flicker" or shift when projected into 3D.
  2. Optimization Bottleneck: Current 3DSR requires 5-10 minutes of optimization per scene.
  3. Pre-defined Priors: 2DSR models are trained on natural images, not multi-view 3D layouts, leading to hallucinated artifacts that don't satisfy geometric constraints.

Methodology: The Core Architecture

SR3R's innovation lies in its three-stage pipeline that is entirely feed-forward.

1. The Gaussian Shuffle Split (Scaffold)

Instead of starting from scratch, SR3R takes a coarse LR 3DGS (from a backbone like DepthSplat) and "densifies" it. Through a Shuffle Split operation, each LR Gaussian is split into six sub-Gaussians. This provides a structural "scaffold" for the network to refine.

2. ViT-based Feature Refinement

The model extracts features from the LR views using a ViT encoder. Crucially, it uses bidirectional cross-attention to align these 2D features with the 3D-aware tokens from the reconstruction backbone. This suppresses 2D artifacts and ensures the features are "geometry-ready."

3. Gaussian Offset Learning

This is the "secret sauce." Instead of predicting the absolute position or color of Gaussians (a high-variance, multi-modal problem), the network predicts residual offsets () to the scaffold using PointTransformerV3.

Model Architecture Figure 1: The SR3R framework, showing the transition from LR views to a refined HR 3DGS through offset learning.

Experiments & SOTA Results

SR3R was benchmarked against the toughest baselines including NoPoSplat and DepthSplat (with upsampling) and per-scene optimizers like SRGS.

  • Quantitative Dominance: On the ACID dataset, SR3R achieved a PSNR of 27.018, outperforming the upsampled baseline (25.315) significantly.
  • Zero-Shot Generalization: Perhaps the most impressive feat is SR3R's performance on the DTU and ScanNet++ datasets. Even without seeing these scenes during training, it outperformed the specialized per-scene optimization method FSGS+SRGS.
  • Speed: SR3R takes ~1.69 seconds for inference, compared to 300+ seconds for optimization-based methods.

Qualitative Results Figure 2: Qualitative comparison showing SR3R recovering significantly sharper textures and cleaner edges than existing feed-forward baselines.

Critical Analysis & Conclusion

Takeaway

SR3R proves that residual learning in 3D space is more effective than image-space upsampling. By constraining the problem to "offsets" on a densified scaffold, the model achieves stability that allows for better high-frequency detail recovery.

Limitations

While fast and accurate, SR3R does add moderate computational overhead compared to "base" feed-forward models (e.g., higher memory usage due to densification). It also currently focuses on 4x upscaling; arbitrary scaling factors might require further architectural flexibility.

Future Outlook

This work paves the way for high-quality 3D content creation on mobile devices or via low-bandwidth streaming, where only sparse LR data can be transmitted, but high-fidelity 3D interaction is required at the edge.

发现相似论文

试试这些示例

  • Search for recent papers that use feed-forward transformers or status-space models to directly reconstruct 3D Gaussian Splatting from sparse images.
  • Which paper first introduced the 'Gaussian Shuffle Split' or 'Gaussian Offset Learning' concept, and how does SR3R's implementation differ from that origin?
  • Examine research applying 3D Super-Resolution (3DSR) techniques to multi-modal generative tasks or large-scale robotics environment mapping.
目录
[CVPR 2025] SR3R: Rethinking 3D Super-Resolution via Feed-Forward Gaussian Splatting
1. Executive Summary
2. Problem & Motivation: The "2D Prior" Ceiling
3. Methodology: The Core Architecture
3.1. 1. The Gaussian Shuffle Split (Scaffold)
3.2. 2. ViT-based Feature Refinement
3.3. 3. Gaussian Offset Learning
4. Experiments & SOTA Results
5. Critical Analysis & Conclusion
5.1. Takeaway
5.2. Limitations
5.3. Future Outlook