LATO: 3D Mesh Flow Matching with Structured TOpology Preserving LAtents

WisPaper

Scholar Search

Scholar QA

Pricing

TrueCite

Workspace

Home

Blog

LATO: 3D Mesh Flow Matching with Structured TOpology Preserving LAtents

LATO: Bridging the Gap Between Scalable Voxel Diffusion and Artist-Friendly Mesh Topology

Summary

Problem

Method

Results

Takeaways

Abstract

LATO introduces a novel topology-preserving 3D generative framework that synthesizes explicit meshes using a Vertex Displacement Field (VDF) and Flow Matching. It achieves state-of-the-art results in generating artist-friendly, high-fidelity meshes while being significantly faster than autoregressive alternatives.

TL;DR

High-quality 3D content creation requires meshes that are not just visually accurate but topologically "clean" for animation and rigging. Most current AI models either produce "watertight" blobs through Marching Cubes or crash when trying to generate complex meshes one triangle at a time. LATO solves this by introducing T-Voxels—a structured latent representation that explicitly preserves topology while using the speed and scalability of Flow Matching.

The "Watertight" Problem vs. The "Sequence" Bottleneck

In the current 3D generative landscape, researchers are stuck between two worlds:

Implicit Models (TRELLIS, CLAY): They represent shapes as fields. While scalable, the resulting meshes are "soups of triangles" that no professional artist could use without a complete retopology.
Explicit Models (MeshGPT, MeshAnything): They treat mesh generation like writing a sentence (AR). However, 3D meshes are much more complex than text; these models quickly run out of memory, leading to broken geometry and "holes."

LATO identifies that the missing link is a topology-preserving latent space that is spatially structured.

Methodology: The Secret is in the VDF

LATO's core innovation is the Vertex Displacement Field (VDF). Instead of just asking "Is there a surface here?", LATO samples points on the surface and records the distance and direction to the three vertices of that specific triangle.

The T-Voxel Workflow

Model Architecture

The process follows a sophisticated pipeline:

VDF Encoding: Surface points are infused with displacement and normal vectors.
Sparse Voxel VAE: This data is compressed into a 128³ sparse grid of T-Voxels.
Hierarchical Decoding: The decoder doesn't just "guess" where vertices are. It progressively subdivides voxels and uses a Pruning Head to kill off empty space, eventually pinpointing vertex locations.
Connection Head: A parallel MLP queries the T-Voxels to predict if an edge should exist between any two given vertices.

Performance and Efficiency

The "magic" of LATO is that it is parallelizable. Unlike autoregressive models where triangle #1000 must wait for triangle #999, LATO's flow matching solves the entire shape simultaneously.

Inference Time Comparison

As shown in the graph above, while competitors like DeepMesh or MeshSilkSong see their generation times skyrocket as mesh complexity grows, LATO stays nearly flat. It can generate 15,000+ triangles in less than 10 seconds on a single H100 GPU.

Why It Matters: Beyond Watertight Meshes

Most 3D AI models require "watertight" shapes (no holes, perfectly closed). In the real world, 3D assets are messy—car engines have open parts, leaves are flat planes. Because LATO models connectivity explicitly through the VDF, it can handle open surfaces and non-manifold assets—the "dirty" data that makes up a huge portion of actual artist-created libraries.

Scalable City Synthesis

The authors also demonstrated LATO's power by applying it to urban environments. Because the representation is sparse and compositional, it can generate individual building "envelopes" and populate them with fine-grained topological details to create massive cityscapes.

City Synthesis Result

Deep Insight & Conclusion

LATO represents a paradigm shift. It moves away from the "SDF-to-Marching-Cubes" pipeline that has dominated the field for years. By proving that a structured voxel latent can carry enough information to predict discrete graph connectivity, it opens the door for generative models that speak the same language as professional 3D artists.

Limitations: The VDF resolution is still tied to the voxel grid (128³). While impressive, extremely tiny geometric details might still be smoothed over. Future iterations using octree-based or multi-scale T-Voxels could solve this remaining hurdle.

Find Similar Papers

Try Our Examples

Search for recent papers investigating sparse voxel representations for 3D mesh generation published after 2024.
Which paper first proposed the Vertex Displacement Field, and how does LATO's implementation specifically optimize it for generative flow matching?
Explore if the T-Voxel architecture has been adapted for multi-modal tasks such as 3D-aware video generation or interactive character rigging.

Contents

LATO: Bridging the Gap Between Scalable Voxel Diffusion and Artist-Friendly Mesh Topology

1. TL;DR

2. The "Watertight" Problem vs. The "Sequence" Bottleneck

3. Methodology: The Secret is in the VDF

3.1. The T-Voxel Workflow

4. Performance and Efficiency

5. Why It Matters: Beyond Watertight Meshes

5.1. Scalable City Synthesis

6. Deep Insight & Conclusion