DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative Latent Denoising

WisPaper

学术搜索

学术问答

价格

TrueCite

工作空间

Home

Blog

DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative Latent Denoising

[CVPR 2026] DreamPartGen: Bridging the Gap Between Geometry and Relational Semantics in 3D Generation

总结

问题

方法

结果

要点

摘要

DreamPartGen is a semantically grounded text-to-3D generation framework that treats objects as compositions of meaningful parts. It introduces Duplex Part Latents (DPLs) and Relational Semantic Latents (RSLs) to achieve state-of-the-art results, including a 53% reduction in Chamfer Distance and over 20% improvement in CLIP/ULIP alignment.

Executive Summary

TL;DR: DreamPartGen is a novel framework that moves away from monolithic 3D generation toward a "compositional" logic. By introducing Duplex Part Latents (DPLs) and Relational Semantic Latents (RSLs), it ensures that generated objects aren't just visually plausible but logically assembled (e.g., ensuring a chair's seat is actually above its legs). It achieves a massive 53% improvement in geometric fidelity over existing SOTA methods.

Background: Within the 3D vision community, most models (like DreamFusion or Trellis) treat objects as "black boxes" of voxels or points. DreamPartGen establishes itself as a leader in Part-level Generative Modeling, shifting the focus from "what the object looks like" to "how the object is built."

The "Floating Wheel" Problem: Motivation

The primary pain point in current text-to-3D is the lack of structural grounding. When you ask a standard model for a "car," it might generate something that looks like a car from a distance, but the wheels might be physically detached from the chassis, or the side mirrors might be asymmetrical.

The authors' insight is that language contains the blueprint. A prompt doesn't just describe a "mug"; it implies a "handle attached to a body." DreamPartGen operationalizes these linguistic relations as explicit variables that guide the entire denoising process.

Methodology: The Co-Denoising Architecture

The core of DreamPartGen lies in its collaborative latent space, which synchronizes two distinct types of information:

Duplex Part Latents (DPLs): These represent individual parts. They are "duplex" because they store both 3D latent sequences (for geometry) and 2D latent sequences (for appearance), tied together by a learnable Part-Identifier.
Relational Semantic Latents (RSLs): This is the "brain" of the operation. It converts text prompts into a relational graph of triplets (e.g., Part A - Support - Part B).

Model Architecture Figure 1: The synchronized co-denoising framework where DPLs and RSLs interact to ensure semantic-geometric consistency.

The "Magic" happens during Synchronized Co-denoising:

Intra-Part Sync: Aligns the geometry of a part with its appearance (e.g., making sure the 'metallic' texture lands on the 'blade' geometry).
Inter-Part Sync: Uses the RSLs to act as a "global planner," ensuring parts are assembled according to functional and spatial constraints.

Breakthrough Results

The quantitative leap is substantial. On the PartRel3D dataset (a 300K triplet dataset curated by the authors), DreamPartGen achieved:

Chamfer Distance (CD): 0.081 (vs. baselines like HoloPart at 0.355).
Part Independence (IoU): 0.304 (lower is better, indicating parts aren't messy, overlapping blobs).

Experimental Results Figure 2: Qualitative comparison showing how DreamPartGen avoids the "surface tearing" and "detached parts" common in prior SOTA models.

Beyond Static Objects: Editing and Scenes

Because the model understands "parts" as separate entities, it excels at Relational Part Editing. You can ask the model to "add a hat on the head" of a character, and because it has a persistent "head" slot (DPL), it can re-denoise just that specific region while maintaining the global context.

Moreover, it scales to Mini-Scene Generation, treating entire objects as "macro-parts" and using the same relational logic to arrange a table and chairs into a coherent dining set.

Critical Analysis & Future Outlook

Takeaway: This work proves that 3D generation is moving toward "Assembly-as-Inference." By providing the model with a relational "blueprint" derived from language, we solve many of the stability and consistency issues that have plagued the field.

Limitations: The model currently relies on the PartRel3D dataset for training. While it generalizes to rare parts, its performance still partially depends on the quality of the relational triplets parsed from the text.

Future Work: The authors suggest this structured representation could be a foundation for Embodied AI, where a robot needs to understand not just what a tool is, but where its functional handle is located relative to its working end.

发现相似论文

试试这些示例

Search for recent papers that utilize Relational Semantic Latents or graph-based diffusion for 3D object assembly and part-level generation.
Which study first introduced the concept of Score Distillation Sampling (SDS) for text-to-3D, and how does the collaborative latent denoising in DreamPartGen differ from SDS-guided optimization?
Explore research that applies Duplex Latent representations (combining 2D and 3D features) to articulated object modeling and robotic manipulation tasks.

[CVPR 2026] DreamPartGen: Bridging the Gap Between Geometry and Relational Semantics in 3D Generation

1. Executive Summary

2. The "Floating Wheel" Problem: Motivation

3. Methodology: The Co-Denoising Architecture

4. Breakthrough Results

5. Beyond Static Objects: Editing and Scenes

6. Critical Analysis & Future Outlook