WisPaper
WisPaper
Scholar Search
Scholar QA
Pricing
TrueCite
[CVPR 2025] OmniTrack: Solving the Embodiment Gap via Physics-Consistent Reference
Summary
Problem
Method
Results
Takeaways
Abstract

OmniTrack is a two-stage motion tracking framework for humanoid robots that achieves general, high-dynamic control by decoupling physical feasibility from motion tracking. By converting noisy human motion data into physics-consistent references, it sets a new SOTA in tracking accuracy and stability, enabling hour-long continuous execution and complex maneuvers like flips on the Unitree G1.

TL;DR

Humanoid robots often struggle with "unnatural" motions because the reference data (captured from humans) doesn't account for the robot's specific physics. OmniTrack introduces a brilliant two-stage solution: it first "imagines" how a robot would realistically perform a human movement in a simulator, and then teaches the robot to follow that realistic version. This simple decoupling allows for unprecedented stability, enabling the Unitree G1 to perform cartwheels, flips, and hour-long continuous marathons.

The "Ghosting" Problem: Why Humanoids Fall

When we retarget human motion capture (MoCap) data to a robot, we encounter the Embodiment Gap. Humans have different weight distributions, joint limits, and strengths compared to robots.

If you force a robot to follow a human skeleton exactly, the robot often encounters "physical artifacts":

  • Floating: The reference tells the robot to be in the air when its physics says it should be on the ground.
  • Penetration: The reference joint angles cause the robot's limbs to clip through its own body or the floor.
  • Foot Skating: The robot’s feet slide unrealistically because the human’s gait was different.

Current SOTA methods try to make the controller "smart" enough to ignore these errors. But as the authors of OmniTrack argue, this forces the robot to solve two problems at once: How do I move? and Is this move even possible?

Methodology: The Power of Decoupling

OmniTrack addresses this by splitting the problem into two distinct stages, moving from "Ideal World" to "Real World."

Stage 1: Physical Motion Generation (PMG)

In this stage, the system acts as a motion filter. It uses a policy that has "God Mode" access (privileged information like exact global positions and velocities) within a simulator. This policy takes the messy human MoCap and "rolls it out" in a physics engine. The result? A new trajectory that looks like the human original but obeys every law of the robot's specific physics.

OmniTrack Framework Fig 1: The two-stage pipeline. Stage I refines the data; Stage II trains the deployment policy.

Stage 2: General Motion Tracking (GMT)

Now that we have a dataset of feasible motions, we train the actual deployment controller. Unlike Stage 1, this policy only sees what a real robot sees: IMU data and joint positions (proprioception). Because it is no longer fighting against "impossible" commands, the controller can focus entirely on robustness and balance.

Experimental Results: Scaling the Skills

The researchers tested OmniTrack on massive datasets (LAFAN1 and AMASS). The results show that as you add more diverse data, traditional methods start to fail because the "noise" (infeasible motions) becomes overwhelming. OmniTrack, however, stays stable.

| Method | Success Rate (Hard Motions) | MPJPE (Error ↓) | | :--- | :--- | :--- | | OmniH2O | 48.32% | 54.58 | | BeyondMimic | 70.04% | 55.75 | | OmniTrack (Ours) | 84.81% | 46.43 |

Real-World Versatility

The true test was the Unitree G1 humanoid. OmniTrack displayed incredible Zero-shot Sim-to-Real transfer:

  • Extreme Agility: Executing consecutive side flips and cartwheels.
  • Endurance: Running outdoor for 60 minutes straight until the battery died.
  • Teleoperation: Using a VR headset to control the robot in real-time, where the Stage I filter smoothed out the shaky human inputs before they reached the robot's motors.

Experimental Results Table 1: Tracking error comparison. Note the high alignment between simulation and real-world performance.

Critical Insight: Why Does This Matter?

The core takeaway is that Data Quality > Algorithm Complexity. By spending computational effort to "clean" the reference motions in Stage 1, we make the reinforcement learning task in Stage 2 significantly easier. This suggests that the future of general-purpose humanoids lies in building "Motion Foundation Models" that understand the relationship between human intent and robot dynamics, rather than just raw imitation.

Limitations & Future Work

While OmniTrack is a leap forward for motion tracking, it still relies on a simulator for the Stage 1 rollout. Future iterations might move this "physical filtering" into a generative world model, allowing robots to adapt to new environments (like mud or ice) on the fly without needing a pre-defined simulator rollout for every new motion.


Senior Editor's Note: OmniTrack elegantly proves that in the quest for General AI in robotics, we must respect the "physics" as much as the "intelligence." It’s not just about what to do, but what is possible to do.

Find Similar Papers

Try Our Examples

  • Search for recent papers that use State Space Models (SSMs) or Diffusion Models to bridge the embodiment gap in human-to-humanoid motion retargeting.
  • Which study first introduced the concept of "privileged information" in Reinforcement Learning for robotics, and how does OmniTrack’s two-stage teacher-student approach differ from classic Distillation methods like DAgger?
  • Investigate how physics-consistent reference generation can be applied to multi-modal humanoid tasks, such as combining whole-body locomotion with vision-based object manipulation.
Contents
[CVPR 2025] OmniTrack: Solving the Embodiment Gap via Physics-Consistent Reference
1. TL;DR
2. The "Ghosting" Problem: Why Humanoids Fall
3. Methodology: The Power of Decoupling
3.1. Stage 1: Physical Motion Generation (PMG)
3.2. Stage 2: General Motion Tracking (GMT)
4. Experimental Results: Scaling the Skills
4.1. Real-World Versatility
5. Critical Insight: Why Does This Matter?
6. Limitations & Future Work