Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving

WisPaper

学术搜索

学术问答

价格

TrueCite

工作空间

Home

Blog

Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving

[ICLR 2026] Hyper Diffusion Planner: Scaling Diffusion Models for Real-World Autonomous Driving

总结

问题

方法

结果

要点

摘要

This paper introduces Hyper Diffusion Planner (HDP), a large-scale end-to-end autonomous driving (E2E AD) framework that utilizes a diffusion-based decoder for trajectory planning. Evaluated via 200 km of real-world road testing, HDP achieves a 10x performance improvement over baseline diffusion planners by optimizing loss space, trajectory representation, and data scaling.

TL;DR

The Hyper Diffusion Planner (HDP) is a breakthrough in End-to-End (E2E) Autonomous Driving that transitions diffusion-based planning from "simulation-only" to "real-road-ready." By systematically optimizing the diffusion loss space and introducing a mathematically grounded Hybrid Loss (coupling velocity and waypoints), the researchers achieved a 10x performance boost in closed-loop real-world testing (200 km).

Problem & Motivation: The Gap Between Math and Asphalt

While Diffusion Models are the "SOTA" for image generation and robotic manipulation, applying them to autonomous driving (AD) reveals three critical "pain points":

Jitter vs. Geometry: Models supervised on waypoints capture the path well but produce jerky, un-drivable velocity profiles.
Mode Collapse: On small datasets (like 100k frames), diffusion planners often fail to show their famous multi-modality, behaving like simple regression models.
Safety Gap: Imitation learning blindly copies human behavior, including mistakes, and lacks a mechanism to prioritize "not crashing" in long-tail scenarios.

Methodology: The Core Innovations

1. Re-thinking the Loss Space

Most diffusion models predict the noise ( $ϵ$ ). However, HDP finds that in AD, trajectories live on a low-dimensional manifold. Predicting the clean data ( $a u_{0}$ ) directly leads to faster convergence and eliminates high-frequency artifacts common in $ϵ$ -prediction.

2. The Hybrid Loss (Velocity + Waypoints)

To solve the "jitters," the authors predict velocity but supervise on both velocity and waypoints. They mathematically prove that this formulation—termed a P-norm Score Matching loss—is unbiased and maintains the integrity of the data distribution while ensuring both global geometric accuracy and local kinematic smoothness.

Model Architecture Fig 1: The HDP Architecture featuring a Perception Backbone and a Transformer-based Diffusion Decoder.

3. Safety-Aware RL Post-Training

To refine the model without expensive online real-vehicle RL, HDP uses a "pseudo-closed-loop" simulation. It applies Reward-Weighted Regression: $L_{R L} = E [exp (β r) ∣∣ v_{h} e t a - v ∣ ∣_{P}^{2}]$ This "up-weights" safe trajectories in the training data, aligning the model with safety constraints without requiring complex gradient backpropagation through the denoising chain.

Experiments & Results: The Power of Scaling

The most striking result is the Emergence of Data Scaling. While benchmarks like NAVSIM suggest diffusion models don't show multimodality, HDP proves this is a data volume issue.

Scaling Multi-modality: Diversified behaviors only emerge after crossing the ~10M frame threshold.
Real-Vehicle Performance: Scaling from 10M to 70M frames improved success rates by over 20%.

Performance Comparison Table 1: Step-by-step performance gains from Base Model to HDP-RL.

In 200 km of urban testing, HDP handled complex "Navigational Lane Changes" and "VRU Yielding" with human-like smoothness, which was previously a major weakness for E2E learning models.

Real World Testing Fig 2: Snapshots of HDP successfully performing unprotected turns and yielding to cross traffic.

Critical Analysis & Conclusion

Takeaway: HDP demonstrates that successful E2E AD doesn't require complex, hand-crafted heuristics (like anchor trajectories). Instead, it requires a theoretically sound loss function and significant data scale.

Limitations:

The current RL reward focuses primarily on safety, which can sometimes lead to overly "conservative" driving (e.g., waiting too long at intersections).
Future work needs to balance safety with traffic efficiency to make the agent more assertive in dense traffic.

Future Outlook: HDP sets a new baseline for "Generalizable AD." By showing that diffusion models scale as well as LLMs, it opens the door for Large Foundation Models in the physical world.

发现相似论文

试试这些示例

Search for recent papers that utilize diffusion models for end-to-end autonomous driving specifically focused on real-world (non-simulation) deployment and closed-loop evaluation.
Which study first identified the "low-dimensional manifold" characteristics of trajectories in generative planning, and how does HDP's hybrid loss mathematically differ from early Diffusion Policy implementations?
Explore research that applies safety-aware reinforcement learning or reward-weighted regression to improve the robustness of generative world models or planners in robotics.

[ICLR 2026] Hyper Diffusion Planner: Scaling Diffusion Models for Real-World Autonomous Driving

1. TL;DR

2. Problem & Motivation: The Gap Between Math and Asphalt

3. Methodology: The Core Innovations

3.1. 1. Re-thinking the Loss Space

3.2. 2. The Hybrid Loss (Velocity + Waypoints)

3.3. 3. Safety-Aware RL Post-Training

4. Experiments & Results: The Power of Scaling

5. Critical Analysis & Conclusion