ViewSplat: View-Adaptive Dynamic Gaussian Splatting for Feed-Forward Synthesis

WisPaper

学术搜索

学术问答

价格

TrueCite

工作空间

Home

Blog

ViewSplat: View-Adaptive Dynamic Gaussian Splatting for Feed-Forward Synthesis

[ECCV 2024] ViewSplat: Breaking the Static Bottleneck of Feed-Forward 3D Gaussian Splatting

总结

问题

方法

结果

要点

摘要

ViewSplat is a novel pose-free, feed-forward 3D Gaussian Splatting (3DGS) framework that achieves state-of-the-art novel view synthesis by shifting from static primitive regression to a view-adaptive dynamic representation. Key achievements include superior fidelity on RE10K and ACID benchmarks while maintaining high real-time rendering speeds (154 FPS).

TL;DR

ViewSplat introduces a view-adaptive dynamic refinement layer to generalizable 3D Gaussian Splatting. By using a hypernetwork to generate target-view-conditioned residual updates, it allows 3D primitives to "morph" their geometry and appearance on-the-fly to match specific viewpoints. This achieves SOTA fidelity in pose-free reconstruction, outperforming static methods that rely on high-order Spherical Harmonics.

Positioning: This work moves beyond the "regression-only" paradigm of feed-forward 3DGS, introducing a lightweight dynamic update mechanism that mimics the refinement process of optimization-based methods but at 150+ FPS.

Problem & Motivation: The Static Primitive Bottleneck

Traditional generalizable 3DGS (e.g., pixelSplat, MVSplat) follows a rigid pipeline: extract features -> regress Gaussian attributes -> render. While fast, this "one-size-fits-all" approach fails when encountering:

Non-Lambertian effects: Mirrors and glossy surfaces look different from every angle.
Estimation Errors: A single forward pass often produces slightly displaced centers or incorrect scales that a static renderer cannot fix.

The authors observed that while regressing a universal 3D scene is hard, adjusting primitives for a specific view is significantly easier.

Methodology: View-Adaptive Dynamic Splatting

The core innovation of ViewSplat is the View-Dependent Head. Instead of predicting just the Gaussians, the model predicts weights for a dynamic MLP.

Architecture Overview

1. The Hypernetwork Design

The architecture uses a Dense Prediction Transformer (DPT) to generate pixel-wise parameters for a "View MLP." This allows the model to create custom response functions for different parts of the scene (e.g., complex functions for shiny metal, simple ones for flat walls).

2. Pose Reparameterization

Feeding a raw $4 imes 4$ matrix into an MLP is inefficient. ViewSplat reparameterizes the target pose into:

A 3D unit viewing direction.
A 1D log-scale distance (preserving sensitivity for near-field objects where perspective changes are most drastic).

3. Residual Refinement

The View MLP outputs residuals ($\Delta \mu, \Delta \alpha, \Delta r, \Delta s, \Delta c$) which are added to the base Gaussians. This "Dynamic Refinement" allows the primitives to shift and change opacity/color based on where the camera is.

Experiments & Results: New SOTA in Pose-Free NVS

ViewSplat was tested on RE10K (interiors) and ACID (nature) datasets. It consistently outperformed previous baselines across all metrics.

Visual Comparison

Key Insights from Results:

SH Degrees vs. Dynamic Updates: The authors found that increasing Spherical Harmonics (SH) degrees from 4 to 8 provided almost no gain. However, ViewSplat's residual updates at SH degree 4 vastly outperformed high-order SH models, proving that MLPs are better at capturing non-linear appearances than SH coefficients.
Syncing $\mu$ and $\alpha$: An important ablation showed that if you update the position ($\mu$) without updating the opacity ($\alpha$), the scene collapses into artifacts. Proper view-adaptation requires a holistic update of all geometric and photometric attributes.

Experimental Table

Critical Analysis & Conclusion

Takeaway: ViewSplat successfully demonstrates that the bottleneck in feed-forward 3DGS wasn't necessarily the feature backbone, but the rigidity of the primitive representation. By making the splatting process view-adaptive, they regained much of the fidelity lost when moving away from per-scene optimization.

Limitations:

Unobserved Regions: Like all reconstruction models, it cannot "hallucinate" parts of the room that weren't in the input images (see Fig. 6 in the paper).
Rendering Overhead: The dynamic refinement does reduce rendering speed (from ~390 FPS to ~150 FPS), though it remains well above the real-time threshold.

Future Outlook: This view-adaptive paradigm could be extended to 4D (temporal) scenes or integrated with generative priors (Diffusion Models) to solve the "unobserved region" problem.

发现相似论文

试试这些示例

Find recent papers on generalizable 3D Gaussian Splatting that address non-Lambertian surface reflections or specularities without per-scene optimization.
What are the seminal works on Hypernetworks for scene representation, and how does ViewSplat’s dynamic MLP weight generation differ from them?
Explore research applying view-adaptive residual updates or dynamic primitive modulation to other 3D representations like Radiance Fields or Mesh-based rendering.

[ECCV 2024] ViewSplat: Breaking the Static Bottleneck of Feed-Forward 3D Gaussian Splatting

1. TL;DR

2. Problem & Motivation: The Static Primitive Bottleneck

3. Methodology: View-Adaptive Dynamic Splatting

3.1. 1. The Hypernetwork Design

3.2. 2. Pose Reparameterization

3.3. 3. Residual Refinement

4. Experiments & Results: New SOTA in Pose-Free NVS

4.1. Key Insights from Results:

5. Critical Analysis & Conclusion