LGTM (Less Gaussians, Texture More) is a novel feed-forward framework for 3D Gaussian Splatting that enables high-fidelity 4K novel view synthesis without per-scene optimization. It achieves this by predicting compact geometric primitives coupled with detailed per-primitive textures, effectively decoupling geometric complexity from rendering resolution.
TL;DR
LGTM (Less Gaussians, Texture More) is the first feed-forward framework capable of 4K novel view synthesis without needing per-scene optimization. By replacing the standard "one color per Gaussian" approach with per-primitive textures, it decouples geometry from appearance. This allows the model to render ultra-high-resolution details using a fraction of the memory and computation required by previous SOTA methods.
Problem & Motivation: The Quadratic Wall
Traditional feed-forward 3D Gaussian Splatting (3DGS) links the number of primitives directly to the image resolution. If you want to scale from 512p to 4K, you typically need 64 times more Gaussians. This creates a "quadratic wall" where training and inference become impossible due to OOM (Out of Memory) errors on standard GPUs.
Furthermore, 3DGS is geometrically inefficient for textures. To represent a sharp local detail (like text on a wall), vanilla 3DGS must "spam" hundreds of tiny Gaussians to approximate the pattern, even if the underlying geometry is a simple flat plane. The authors' insight is simple: Why predict millions of points when you can predict a few "textured billboards"?
Methodology: Decoupling Geometry and Appearance
The core of LGTM is a dual-network architecture that separates the "where" (geometry) from the "what" (texture).
1. The Primitive Network
This module processes low-resolution inputs to establish a robust geometric foundation. It predicts the standard 2DGS parameters: position ($\mu$), scale ($s$), rotation ($r$), and opacity ($o$). Crucially, even though it takes low-res input, it is trained with high-resolution supervision to ensure the primitives are correctly sized for 4K rendering.
2. The Texture Network & Learned Projective Mapping
Instead of a single spherical harmonic color, each Gaussian now carries a $T imes T$ texture map. To fill these maps with detail, the authors use Projective Texture Mapping. They "back-project" high-resolution source image pixels onto the Gaussian primitives.
- Feature Fusion: The network aggregates patchified high-res features, projective features, and geometric features to predict both Color ($T^c$) and Alpha ($T^\alpha$) textures.

Experiments: 4K Efficiency
LGTM was integrated into several baselines including NoPoSplat (unposed), DepthSplat (posed), and Flash3D (monocular).
Performance Benchmarks
The scalability is the most impressive feat. When moving from a 512p baseline to a 4K LGTM model:
- Pixels: Increase by 64x.
- Peak Memory: Only increases by 1.8x.
- Inference Time: Only increases by 1.47x.

As seen in the visual results above, vanilla 3DGS at high resolution often appears blurry or "spotty" because the primitive density cannot keep up with the pixel density. LGTM maintains sharp texture edges and fine details (like foliage and architectural ornaments) by offloading the detail to the texture maps.
Critical Analysis & Conclusion
The "Takeaway"
The industry value of LGTM lies in its scalability. Current VR/AR hardware requires high resolution (4K per eye) but has limited memory. LGTM proves that we don't need million-point clouds for high-fidelity reconstruction; we need "smarter" primitives that can carry their own texture.
Limitations
While textures solve the appearance problem, the quality is still "bounded" by the underlying geometry. If the Primitive Network fails to predict a semi-accurate surface (e.g., in complex multi-view cases with large gaps), the textures will appear distorted or misaligned. Additionally, the texture resolution is currently a fixed hyperparameter that needs manual tuning for different hardware targets.
In conclusion, LGTM is a major step toward making instant, high-fidelity 3D content generation a reality for high-resolution displays, effectively bridging the gap between neural rendering and traditional computer graphics texturing.
