Strips as Tokens: Artist Mesh Generation with Native UV Segmentation

WisPaper

学术搜索

学术问答

价格

TrueCite

工作空间

Home

Blog

Strips as Tokens: Artist Mesh Generation with Native UV Segmentation

[SIGGRAPH 2025] SATO: Bridging the Gap Between Neural 3D Generation and Professional Artist Workflows

总结

问题

方法

结果

要点

摘要

SATO (Strips as Tokens) is an autoregressive transformer framework for generating artist-quality 3D meshes with native UV segmentation. By representing meshes as continuous "triangle strips" and expanding the token vocabulary with semantic delimiters, it achieves SOTA results in both triangle and quad mesh synthesis while being the first to predict UV chart partitions during generation.

TL;DR

Current 3D AI often produces "triangle soup"—meshes that look good but are a nightmare for professional animators to edit. Strips as Tokens (SATO) changes this by tokenizing meshes as "strips" rather than independent faces. This native alignment with artist-created edge flow allows the model to generate high-quality triangle/quad meshes and, for the first time, predict clean UV segmentation boundaries directly in the token stream.

The Motivation: Why Your AI Mesh isn't "Game-Ready"

If you talk to a technical artist, they won't complain about the shape of AI-generated meshes; they'll complain about the topology. Most AI models (like MeshGPT or DeepMesh) treat a mesh as a collection of patches or sorted coordinates. This leads to several issues:

Broken Edge Flow: Animation requires "loops" of edges to deform naturally. Randomly sorted triangles break these loops.
UV Nightmares: Standard AI outputs have no concept of UV islands, making texturing a manual, painful process.
Tri vs. Quad Divide: Most models only do triangles, whereas professional pipelines crave Quadrilateral (Quad) meshes for better subdivision and deformation.

SATO's core insight is to use triangle strips—a legacy graphics concept—as the fundamental unit of tokenization to solve all three problems at once.

Methodology: Strips, Quantization, and Unified Decoding

1. Strips as the "Grammar" of Topology

Instead of predicting isolated triangles, SATO predicts a "zipper-like" sequence where each new vertex forms a face with the previous two. This movement naturally mimics how an artist manually "extrudes" geometry.

Model Architecture Figure 1: The SATO Pipeline. From point cloud conditioning to strip-based tokenization and unified tri/quad decoding.

2. Native UV Segmentation

How do you teach a Transformer where a UV seam should be? SATO expands the vocabulary. In addition to coordinate tokens, it uses Structural Tokens:

C_t: Signals the end of one strip and the start of another.
C_uv: Signals the boundary of a UV island.

Because the model predicts these tokens autoregressively, it is essentially "thinking" about the texture layout while it builds the geometry.

3. The Unified Tri-Quad Trick

This is the most elegant part of the paper. By enforcing a specific vertex ordering (swapping the last two indices in a quad), the authors made it so that a quad strip and a triangle strip look identical to the neural network.

Triangle Mode (Stride=1): 1 vertex added = 1 new triangle.
Quad Mode (Stride=2): 2 vertices added = 1 new quad.

This allows the model to be pre-trained on millions of common triangle meshes and then "fine-tuned" on a smaller, high-quality quad dataset without changing the architecture.

Experiments: How Good is the "Flow"?

SATO was tested against SOTA baselines (MeshAnythingV2, BPT, DeepMesh). While others struggle with "skinny" triangles and fragmented patches, SATO produces long, clean rows of faces.

Experimental Results Figure 2: Qualitative comparison showing SATO's cleaner topology and superior structural coherence compared to existing methods.

Key Quantitative Wins:

Compression: Achieved a 0.283 compression rate (better than DeepMesh's 0.33) despite having a larger vocabulary, thanks to longer contiguous strips.
Geometry: CD (Chamfer Distance) and HD (Hausdorff Distance) consistently hit lower values, meaning the generated mesh sticks closer to the input point cloud.
UV Layout: As shown below, unwrapping SATO's predicted segments results in layouts that professional artists can actually use for texture painting.

UV Unwrapping Figure 3: UV layouts generated by SATO. The segments are clean, logical, and respect the object's symmetry.

Critical Analysis & Conclusion

SATO is a significant step toward Game-Ready AI. By moving away from purely geometric heuristics and embracing professional modeling conventions (strips, loops, and UV charts), it delivers assets that are "born" with a usable structure.

Limitations:

Degenerate Quads: In rare cases (odd-length strips), the model might fall back to a triangle to close a loop.
Data Dependencies: The quality of quad-mesh generation is still bottlenecked by the availability of high-quality quad datasets, which are much rarer than triangle datasets.

Final Takeaway: SATO proves that if we want AI to help artists, we must first teach the AI to "speak" the language of artist-created topology.

发现相似论文

试试这些示例

Search for recent papers in autoregressive 3D mesh generation that utilize structural priors beyond simple coordinate sorting or Delaunay triangulation.
Which paper first formally introduced the concept of "triangle strips" in computer graphics, and how is it computationally optimized for modern GPU rendering pipelines?
Explore 3D generative models that integrate both geometric synthesis and UV parameterization/segmentation within a single end-to-end neural architecture.

[SIGGRAPH 2025] SATO: Bridging the Gap Between Neural 3D Generation and Professional Artist Workflows

1. TL;DR

2. The Motivation: Why Your AI Mesh isn't "Game-Ready"

3. Methodology: Strips, Quantization, and Unified Decoding

3.1. 1. Strips as the "Grammar" of Topology

3.2. 2. Native UV Segmentation

3.3. 3. The Unified Tri-Quad Trick

4. Experiments: How Good is the "Flow"?

5. Critical Analysis & Conclusion