SoftMimicGen: A Data Generation System for Scalable Robot Learning in Deformable Object Manipulation

WisPaper

学术搜索

学术问答

价格

TrueCite

工作空间

Home

Blog

SoftMimicGen: A Data Generation System for Scalable Robot Learning in Deformable Object Manipulation

[CVPR 2025] SOFTMIMICGEN: Mastering Deformable Objects via Non-Rigid Trajectory Warping

总结

问题

方法

结果

要点

摘要

SOFTMIMICGEN is an automated synthetic data generation pipeline designed for deformable object manipulation. By leveraging non-rigid registration and a small set of human demonstrations (1-10), it scales datasets to thousands of high-fidelity trajectories, enabling SOTA performance in complex tasks like cloth folding, rope manipulation, and surgical threading.

Executive Summary

TL;DR: SOFTMIMICGEN is a breakthrough pipeline that automates the generation of massive robot datasets for deformable objects (e.g., cloth, rope, tissue). By starting with just a handful of human demonstrations and applying a novel non-rigid registration mechanism, it synthesizes thousands of diverse, successful trajectories. This removes the "data bottleneck" for soft-body manipulation, enabling robots to learn complex skills like folding towels or suturing tissue with minimal human effort.

Academic Context: This work is a direct evolution of the MIMICGEN lineage. While previous iterations focused on rigid-body invariance (assuming objects have a fixed "center"), SOFTMIMICGEN breaks this mold by treating objects as dynamic point clouds, positioning itself as a core infrastructure for future robot foundation models.

The "Rigidity" Trap in Robot Learning

Current SOTA methods for data generation, such as the original MIMICGEN, rely on a simple but powerful assumption: Invariance. If a robot knows how to pick up a mug at point A, we can mathematically "shift" that motion to pick up a mug at point B by calculating the rigid transform between the two poses.

The Problem: Deformable objects (ropes, sponges, fabrics) have no "fixed pose." When you move one end of a rope, its entire geometry changes non-linearly. Rigid SE(3) transforms cannot capture this. Consequently, existing automated systems fail miserably when the object's initial state deviates even slightly from the demonstration.

Methodology: From Rigid Transforms to Warp Fields

The core innovation of SOFTMIMICGEN is the shift from Rigid SE(3) Frames to Non-Rigid Registration.

1. Representation as Nodes

The system treats every deformable object as a collection of nodes $O = {n_i}_{i=1}^{N_O}$. This "point cloud" representation allows the system to track local deformations that a single coordinate frame would miss.

2. The Warp Field (Non-Rigid Registration)

When a new scene is generated, the system compares the current object state to the state in the human demonstration. It solves an optimization problem to find a smooth function $f: \mathbb{R}^3 \rightarrow \mathbb{R}^3$ that maps points from the source to the target.

3. Trajectory Adaptation

The robot's end-effector path is not just shifted; it is warped.

Position: $p_t \rightarrow f(p_t)$
Rotation: $R_t \rightarrow ext{orth}(J_f(p_t)R_t)$ Using the Jacobian ($J_f$) of the warp field allows the robot's gripper orientation to adapt to the local curvature of the deformed object.

SOFTMIMICGEN Overall System Pipeline Fig 2: The system selects the best source segment based on registration cost and applies the warp field to generate New trajectories.

Experiments: Scaling to the "Unsimulatable"

The authors introduced a suite of 10 challenging tasks across 4 different robot embodiments, including a Humanoid (GR1) and a Surgical Robot (dVRK).

Key Metrics:

Scaling Power: For the "Franka - Rope" task, human-only data yielded a 2% success rate, while SOFTMIMICGEN-boosted data achieved 100%.
Generalization: Unlike MIMICGEN, which only succeeded in 8% of rope trials, SOFTMIMICGEN achieved a 98% success rate by successfully adapting to varied initial rope segments.
Architecture Agnostic: The generated data proved effective for both Diffusion Policies and BC-RNN-GMM architectures.

Experimental Results Across Task Suite Table 1: Drastic performance gains (often >50%) when using generated data versus limited human source demos.

Sim-to-Real: Bridging the Gap

A critical validation was the deployment on real hardware. Using a "Point Bridge" (a VLM-guided point cloud extractor), the team showed that policies trained purely in sim could achieve Zero-Shot Transfer to a real Franka arm folding a real towel. Furthermore, "Sim-Real Co-training" (mixing 1,000 sim demos with 30 real ones) pushed the success rate of bag loading from 33% to 93.3%.

Real World Rollouts Fig 4: Real-world validation on towel folding, rope manipulation, and bag loading.

Critical Insight & Conclusion

SOFTMIMICGEN proves that the bottleneck in robot learning isn't necessarily the algorithms (like Diffusion Policy), but the data richness. By using non-rigid math to "hallucinate" valid human-like interactions in simulation, we can train robots on corner cases that would take years to collect manually.

Limitations: Currently, the system assumes a fixed sequence of subtasks. Future iterations will likely need to handle "unstructured" deformation recovery—where the robot must decide to retry a fold if the fabric slips.

Final Takeaway: This is a mandatory read for anyone building "Foundation Models" for robotics. Deformable object manipulation is no longer a niche physics problem; it is now a scalable data-generation problem.

发现相似论文

试试这些示例

Search for recent papers that utilize non-rigid registration or Thin Plate Splines (TPS) specifically for robot trajectory transfer in deformable manipulation.
Which paper originally proposed using the Jacobian of a deformation field to warp end-effector rotations, and how does SOFTMIMICGEN's implementation differ?
Explore research that applies Diffusion Policies or BC-RNN-GMM to large-scale synthetic datasets for multi-modal robot learning in Isaac Lab environments.

[CVPR 2025] SOFTMIMICGEN: Mastering Deformable Objects via Non-Rigid Trajectory Warping

1. Executive Summary

2. The "Rigidity" Trap in Robot Learning

3. Methodology: From Rigid Transforms to Warp Fields

3.1. 1. Representation as Nodes

3.2. 2. The Warp Field (Non-Rigid Registration)

3.3. 3. Trajectory Adaptation

4. Experiments: Scaling to the "Unsimulatable"

4.1. Key Metrics:

5. Sim-to-Real: Bridging the Gap

6. Critical Insight & Conclusion