LATO introduces a novel topology-preserving 3D generative framework that synthesizes explicit meshes using a Vertex Displacement Field (VDF) and Flow Matching. It achieves state-of-the-art results in generating artist-friendly, high-fidelity meshes while being significantly faster than autoregressive alternatives.
TL;DR
High-quality 3D content creation requires meshes that are not just visually accurate but topologically "clean" for animation and rigging. Most current AI models either produce "watertight" blobs through Marching Cubes or crash when trying to generate complex meshes one triangle at a time. LATO solves this by introducing T-Voxels—a structured latent representation that explicitly preserves topology while using the speed and scalability of Flow Matching.
The "Watertight" Problem vs. The "Sequence" Bottleneck
In the current 3D generative landscape, researchers are stuck between two worlds:
- Implicit Models (TRELLIS, CLAY): They represent shapes as fields. While scalable, the resulting meshes are "soups of triangles" that no professional artist could use without a complete retopology.
- Explicit Models (MeshGPT, MeshAnything): They treat mesh generation like writing a sentence (AR). However, 3D meshes are much more complex than text; these models quickly run out of memory, leading to broken geometry and "holes."
LATO identifies that the missing link is a topology-preserving latent space that is spatially structured.
Methodology: The Secret is in the VDF
LATO's core innovation is the Vertex Displacement Field (VDF). Instead of just asking "Is there a surface here?", LATO samples points on the surface and records the distance and direction to the three vertices of that specific triangle.
The T-Voxel Workflow

The process follows a sophisticated pipeline:
- VDF Encoding: Surface points are infused with displacement and normal vectors.
- Sparse Voxel VAE: This data is compressed into a 128³ sparse grid of T-Voxels.
- Hierarchical Decoding: The decoder doesn't just "guess" where vertices are. It progressively subdivides voxels and uses a Pruning Head to kill off empty space, eventually pinpointing vertex locations.
- Connection Head: A parallel MLP queries the T-Voxels to predict if an edge should exist between any two given vertices.
Performance and Efficiency
The "magic" of LATO is that it is parallelizable. Unlike autoregressive models where triangle #1000 must wait for triangle #999, LATO's flow matching solves the entire shape simultaneously.

As shown in the graph above, while competitors like DeepMesh or MeshSilkSong see their generation times skyrocket as mesh complexity grows, LATO stays nearly flat. It can generate 15,000+ triangles in less than 10 seconds on a single H100 GPU.
Why It Matters: Beyond Watertight Meshes
Most 3D AI models require "watertight" shapes (no holes, perfectly closed). In the real world, 3D assets are messy—car engines have open parts, leaves are flat planes. Because LATO models connectivity explicitly through the VDF, it can handle open surfaces and non-manifold assets—the "dirty" data that makes up a huge portion of actual artist-created libraries.
Scalable City Synthesis
The authors also demonstrated LATO's power by applying it to urban environments. Because the representation is sparse and compositional, it can generate individual building "envelopes" and populate them with fine-grained topological details to create massive cityscapes.

Deep Insight & Conclusion
LATO represents a paradigm shift. It moves away from the "SDF-to-Marching-Cubes" pipeline that has dominated the field for years. By proving that a structured voxel latent can carry enough information to predict discrete graph connectivity, it opens the door for generative models that speak the same language as professional 3D artists.
Limitations: The VDF resolution is still tied to the voxel grid (128³). While impressive, extremely tiny geometric details might still be smoothed over. Future iterations using octree-based or multi-scale T-Voxels could solve this remaining hurdle.
