Quality over Quantity (QoQ) is a data-centric robot learning framework that curates high-quality demonstrations using Influence Functions. By defining data quality as the contribution of a training sample to reducing validation loss, QoQ achieves state-of-the-art performance, outperforming baselines by up to 30% in real-world success rates.
TL;DR
Training robots on "more" data is often counterproductive if the data is noisy or suboptimal. QoQ (Quality over Quantity) introduces a systematic way to prune robot demonstration datasets by using Influence Functions. Instead of guessing which data is "good" based on visual similarity, QoQ mathematically estimates how much each trajectory reduces the model's loss on a target task. Results show a massive jump in success rates (up to 30%) by training on less but better data.
The Curation Crisis in Robot Learning
The current "scaling law" obsession in robotics assumes that more teleoperated demonstrations lead to better Generalist Robot policies. However, human operators are inconsistent—they hesitate, make mistakes, and take suboptimal paths.
Prior work attempted to solve this using:
- Retrieval-based methods: Finding data that "looks" like expert data in a latent space (VAE).
- Mutual Information: Selecting data with high state-action predictability.
The catch? These are proxy metrics. Just because a trajectory looks like an expert's doesn't mean it helps the neural network learn the underlying physics or logic of the task.
The Core Insight: Influence as Quality
The authors argue that quality = contribution to performance. If adding a specific demonstration helps the robot achieve a lower loss on a "perfect" validation set, that data is high quality.
Mapping the Math to Intuition
QoQ uses a first-order approximation of Influence Functions. Usually, calculating the change in model parameters for every data point requires calculating a massive Hessian matrix (second-order derivatives), which is impossible for modern VLAs with billions of parameters.
QoQ bypasses this using normalized gradient inner products:
Instead of averaging influence across the whole validation set (which adds noise), QoQ looks for the Maximum Influence. It asks: "Is this training step highly relevant to AT LEAST ONE step in my 'perfect' demonstration?"
Fig 1: The QoQ pipeline. Gradients from the training data are compared to validation gradients. Green indicates a helpful "push" toward the desired behavior.
Two Key Innovations for Robotics
1. Maximum Influence Scoring
Unlike standard influence estimation in NLP, robot trajectories are long sequences of divergent behaviors (reaching, grasping, lifting). Averaging these signals washes out the useful information. By taking the max, QoQ identifies specific "critical moments" (like the exact moment of a grasp) where the training data provides the most value.
2. Trajectory-wise Curation
Neural networks are sensitive to distribution shifts. If you only pick the "best" individual state-action pairs, you end up with a dataset full of "grasping" frames but no "reaching" frames. This creates a broken policy. QoQ aggregates scores across a whole trajectory, ensuring the robot learns a coherent start-to-finish behavior.
Results: Turning 50% Success into 99%
The researchers tested QoQ against "Behavior Retrieval" and "Flow Retrieval" across simulations and real Franka robots.
- Simulation (Robomimic): In a "Can" task with 50/50 success and fail data, QoQ identified the successful data with near 100% accuracy, while baselines struggled around 60%.
- Real-World (Banana Grasping): QoQ achieved an 86.7% success rate. The best baseline only hit 56.7%.
- In-the-Wild (DROID Dataset): Even on highly heterogeneous data with different camera angles and environments, QoQ successfully filtered "clean" demonstrations.
Fig 2: Success rate comparison. Notice how QoQ (green) consistently dominates across different task complexities.
Critical Analysis & Future Outlook
The beauty of QoQ is its implementation efficiency. By using techniques like OPORP (random projections) and only computing gradients for a few layers (like the action head), the authors made a theoretically "heavy" method practical for modern Vision-Language-Action (VLA) models like GR00T.
Limitations:
- It still requires a "clean" validation set (though the authors showed you can even use failed policy rollouts if you flip the sign of the influence).
- Computing gradients is still more expensive than simple visual retrieval.
The Takeaway: As we move toward "Robot Foundation Models," the bottleneck is no longer data quantity—it's the signal-to-noise ratio. QoQ provides the mathematical filter needed to ensure our robots learn from the best of us, not our mistakes.
