WisPaper
WisPaper
Scholar Search
Scholar QA
AI Feeds
Pricing
TrueCite
WorldMesh: Anchoring Infinite Worlds with Geometry-First Diffusion
Summary
Problem
Method
Results
Takeaways
Abstract

WorldMesh is a "geometry-first" framework for generating large-scale, navigable multi-room 3D scenes from text prompts. It decouples the synthesis process into a 3D mesh scaffold construction and a mesh-conditioned image diffusion process, ultimately reconstructing the world as high-fidelity 3D Gaussian Splats (3DGS).

TL;DR

WorldMesh solves the "consistency nightmare" of AI-generated 3D environments. By first building a physical "scaffold" (a 3D mesh of rooms and furniture) and then using AI to "paint" over it, it creates navigable, multi-room apartments that don't warp or melt when you walk through them. It achieves a staggering 96.2% preference rate over existing methods like WorldExplorer.

Problem & Motivation: The "Hallucination" of 3D Space

Current AI generators are great at "hallucinating" beautiful 2D images or short videos. However, when you try to turn these into a 3D world, they fail. Why? Because the AI doesn't actually understand that a chair has a back side or that a door leads to a specific room. This results in:

  1. Geometric Drift: Walls shifting as you move.
  2. Object Incoherence: A sofa turning into a bed when viewed from behind.
  3. Scaling Issues: Models "forgetting" the layout of the first room once you enter the second.

The researchers at TU Munich realized that to build a world, you need a blueprint before you start decorating.

Methodology: The Geometry-to-Pixels Pipeline

WorldMesh breaks the task into four distinct academic layers:

1. The Blueprint (Layout Generation)

Using Large Language Models (LLMs like Claude Opus), the system generates a JSON floor plan. This isn't just a picture; it's architectural data: wall thickness, ceiling heights, and door placements.

2. The Scaffold (Mesh Construction)

The floor plan is extruded into a 3D structural mesh. Then, the system uses an image model to "imagine" where furniture goes, segments those objects using SAM 3, and replaces them with actual 3D object models reconstructed in a canonical coordinate system.

Model Architecture

3. Mesh-Anchored Diffusion

This is the "secret sauce." Instead of generating random views, the system renders the untextured mesh and uses that render as a hard constraint for a diffusion model (using Flux.2-Klein and Nano Banana Pro). Because the AI is forced to follow the depth and shape of the mesh, the resulting images are perfectly aligned with the 3D space.

4. 3DGS Optimization

Finally, all these AI-generated "photos" of the scaffold are fused into a 3D Gaussian Splatting (3DGS) representation, which allows for real-time, photorealistic navigation.

Experiments: Dominating the Baselines

The authors compared WorldMesh against leading models like WorldExplorer and SpatialGen.

  • Consistency: In rotations around complex objects (like beds with pillows), WorldMesh maintained shape where others failed.
  • Scale: While baselines struggled with single rooms, WorldMesh generated entire multi-room "Gothic Mansions" and "Scandinavian Apartments."

Qualitative Comparison

Perceptual Performance

In a user study with 31 participants, WorldMesh scored 4.48/5.00 in overall quality, while the nearest traditional baseline (DreamScene360) lagged at 3.19.

Critical Analysis & Conclusion

Why it Works

The success of WorldMesh lies in its Inductive Bias. Pure diffusion models have too much freedom. By "anchoring" the pixels to a mesh, WorldMesh restricts the AI's creativity to texture and lighting, while the structure is handled by rigid 3D geometry.

Limitations

  • Single-Story Only: It currently cannot handle multi-floor structures like staircases effectively.
  • Object Quality: It's dependent on the "SAM-3D-Objects" library; if the object reconstruction fails, the scaffold fails.

The Future

WorldMesh is a massive step toward automated AAA game environment design. Imagine typing "A cyberpunk laboratory spanning three rooms" and receiving a fully navigable, 3DGS-ready world in minutes. This effectively bridges the gap between generative AI and functional 3D graphics.

Find Similar Papers

Try Our Examples

  • Search for recent papers on "geometry-first" 3D scene generation that utilize explicit mesh or voxel scaffolds to guide diffusion models.
  • What are the foundational papers for SAM-3D-Objects and TRELLIS, and how does WorldMesh adapt their single-object reconstruction for multi-room scene assembly?
  • Explore research applying mesh-conditioned 3D Gaussian Splatting to dynamic or outdoor large-scale environments beyond indoor residential layouts.
Contents
WorldMesh: Anchoring Infinite Worlds with Geometry-First Diffusion
1. TL;DR
2. Problem & Motivation: The "Hallucination" of 3D Space
3. Methodology: The Geometry-to-Pixels Pipeline
3.1. 1. The Blueprint (Layout Generation)
3.2. 2. The Scaffold (Mesh Construction)
3.3. 3. Mesh-Anchored Diffusion
3.4. 4. 3DGS Optimization
4. Experiments: Dominating the Baselines
4.1. Perceptual Performance
5. Critical Analysis & Conclusion
5.1. Why it Works
5.2. Limitations
5.3. The Future