This paper investigates the continual learning (CL) capabilities of large-scale pretrained Vision-Language-Action (VLA) models like Pi0 and GR00T. It demonstrates that unlike smaller models, VLAs are remarkably resistant to catastrophic forgetting when using a simple Experience Replay (ER) strategy, achieving near-zero or even positive backward transfer on the LIBERO benchmark.
TL;DR
Conventional wisdom in robotics suggests that robots are "forgetful" students—learning a new task typically means losing the old one. This paper turns that notion on its head. By evaluating large Vision-Language-Action (VLA) models (like Pi0 and GR00T), researchers found these models are naturally resistant to catastrophic forgetting. With just a tiny "memory" (Experience Replay), they can maintain—and sometimes even improve—performance on old tasks while mastering new ones.
The Bottom Line: Pretraining is the ultimate "stability" buffer. It allows models to store knowledge in a way that remains accessible even when task-level performance temporarily dips.
The Stability-Plasticity Dilemma
In robotics, Continual Learning (CL) has always been a battle between Stability (don't forget Task A) and Plasticity (learn Task B quickly). Small models trained from scratch usually fail this; they have a "shallow" understanding, so new weight updates easily overwrite old logic.
Historically, we fought this with:
- Massive Replay Buffers: Keeping 20-50% of all old data.
- Complex Regularization: Like EWC (Elastic Weight Consolidation), which "freezes" important weights.
But as we move into the era of Foundation Models for robotics, does this still hold?
Methodology: Putting VLAs to the Test
The researchers tested two heavyweight VLAs—Pi0 (a Flow Matching model) and GR00T N1.5—against standard BC-Transformers and Diffusion Policies across the LIBERO benchmark tasks.
Fig 1: Success matrices showing the dramatic difference in stability. While the small BC-Transformer (bottom) sees its performance vanish (darker colors) as it learns new tasks, the VLA (top) maintains high success rates (bright yellow) across the board.
The "Surprising" Effectiveness of Simple Replay
The most shocking finding: Experience Replay (ER) is all you need. When using a buffer size of just 2%—where smaller models completely collapse—VLAs maintained near-perfect performance. Some even showed Positive Backward Transfer, meaning learning a later task actually made them better at an earlier one.
Deep Dive: Why Does This Work?
1. Pretraining is the Key Factor
By comparing Pi0 variants (Pretrained vs. From Scratch), the authors proved that pretraining creates a "Pareto Frontier" that small models can't touch. Pretraining doesn't just help you learn faster (forward transfer); it acts as a structural anchor that prevents weight updates from drifting into "garbage" territory for old tasks.
Fig 2: The gap between "Pretrained" and "From Scratch" models widens as the replay buffer gets smaller, proving pretraining mitigates forgetting precisely when data is scarce.
2. Knowledge is "Dormant" Not "Dead"
Perhaps the most profound insight is the Recovery Efficiency. When a VLA looks like it has forgotten a task (0% success rate), the underlying knowledge isn't gone—it's just "misaligned."
- The Probe: Re-finetune the model on the "forgotten" task.
- The Result: The VLA recovers peak performance in less than 10% of the original training steps.
- Small Models: Take 100% or more of the original time to relearn, meaning they truly "erased" the data.
3. Anatomical Forgetting
Through Component Swapping, the team found that the Vision-Language (VL) backbone is the primary source of forgetting (it's where the world representations live), while the Action Head is more consistent across tasks.
Critical Analysis & Conclusion
Takeaways
- Simplicity Wins: For VLAs, we don't need exotic CL algorithms. Simple Replay + Foundation Model = SOTA Continual Learning.
- Pretraining = Insurance: Large-scale data pretraining is not just for "zero-shot" performance; it's a fundamental requirement for long-term robot autonomy.
Limitations
While VLAs are resistant, they are not immune. In the most diverse scenes (LIBERO-10), total forgetting still occurs if the replay buffer is essentially zero (e.g., only 10 samples). Additionally, the "Recovery" ability implies we might need a system that can "self-correct" or "quick-tune" rather than just relying on a static policy.
Future Outlook
This work suggests that the path to a lifelong-learning robot isn't through more complex anti-forgetting math, but through better representation reuse. If the knowledge is already there, we just need the right "trigger" to bring it back to the surface.
Table 1: Note how Pi0 and GR00T consistently maintain high Success Rates (SR) and low Negative Backward Transfer (NBT) compared to all other baselines.
