Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving

WisPaper

学术搜索

学术问答

价格

TrueCite

工作空间

Home

Blog

Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving

[CVPR 2024] Drive My Way: Aligning VLAs with Personalized Driving Styles

总结

问题

方法

结果

要点

摘要

The paper introduces Drive My Way (DMW), a personalized Vision-Language-Action (VLA) driving framework that aligns autonomous vehicle behavior with both long-term driver habits and real-time natural language instructions. By leveraging a novel Personalized Driving Dataset (PDD) and reinforcement fine-tuning (GRPO), DMW achieves state-of-the-art adaptation in closed-loop benchmarks like Bench2Drive, significantly outperforming generic E2E models.

TL;DR

Most autonomous vehicles drive like robots—predictable, rigid, and indifferent to whether you are in a rush or enjoying a leisurely Sunday drive. Drive My Way (DMW) changes this by introducing a Vision-Language-Action (VLA) framework that learns your long-term habits and listens to your real-time commands. By combining user embeddings with a residual policy tuned via reinforcement learning, DMW achieves a "human-in-the-loop" experience that mimics individual driving styles without compromising safety.

The Motivation: Why "One Size Fits All" Fails

Standard end-to-end (E2E) driving models are trained on "expert" data to find the single most optimal path for safety and efficiency. However, human driving is inherently subjective:

The "Late for Work" Dilemma: An aggressive driver might take a tight gap in a merge.
The "Cautious Commuter": A conservative driver prefers a large Time-to-Collision (TTC) buffer.
The Intent Gap: Current systems can't process a command like "I'm tired, take it easy" into a specific adjustment of braking pressure or steering smoothness.

DMW addresses this by treating driving not just as a geometric problem, but as a preference alignment problem.

Methodology: The Architecture of Personality

The core of DMW is its ability to fuse long-term "who you are" (Profile) with short-term "what you want" (Instruction).

1. Long-term Preference Encoder

The researchers built the Personalized Driving Dataset (PDD), involving 30 real drivers. They used a contrastive learning objective (InfoNCE) to align a driver's text profile (experience, habits) with their actual driving trajectories. This creates a User Embedding ($z_p$) that acts as a latent personality prior.

2. The Residual Decoder Strategy

Rather than forcing the model to learn driving from scratch, DMW uses a SimLingo VLA backbone to generate a "Safe Base Action." It then adds a Residual Decoder that predicts small adjustments ($\Delta$) to speed and steering based on the user embedding and language prompt.

Model Architecture

3. Style-Aware Reward Adaptation

To fine-tune this with RL (specifically GRPO), the team used an LLM (GPT-5 equivalent) to dynamically adjust reward weights ($w_s, w_e, w_c$). If a user says "I'm in a rush," the efficiency weight ($w_e$) increases, while the safety threshold ($\beta_{safety}$) becomes more "permissive" within safe legal bounds.

Experimental Results: Does it actually drive like "Me"?

The model was tested in the CARLA simulator on the Bench2Drive benchmark.

Instruction Following: DMW showed a massive shift in behavior between styles. Under "Aggressive" instructions, average speed and acceleration increased significantly while "Conservative" instructions led to much larger headways and smoother braking.
Safety vs. Style: Unlike previous models that lose safety when pushed to be aggressive, DMW maintained a high Success Rate (SR) because the personalization is a residual on top of a safe foundation.
Human Recognition: In user studies, evaluators could identify which driver the AI was mimicking with high accuracy (Alignment Score of 0.92).

Experimental Results Comparison

Deep Insights & Future Outlook

The "Residual" approach is the most brilliant part of this work. By decoupling the fundamental task of driving (safety/navigation) from the style of driving (preference), the authors avoid the catastrophic forgetting typical in multi-task learning.

Limitations: Currently, this is validated in CARLA (Simulation). The real challenge lies in Sim-to-Real transfer—how do we capture the "feel" of a real steering wheel's haptic feedback and translate that into a VLA embedding?

DMW proves that the future of autonomous driving isn't just about reaching the destination; it's about how you get there. By treating language as a first-class citizen in the control loop, we move closer to vehicles that truly understand their passengers.

发现相似论文

试试这些示例

Search for recent papers using Group Relative Policy Optimization (GRPO) or similar RLHF techniques for motion planning in autonomous driving.
What are the primary methodologies for "Residual Policy Learning" in robotics, and how does DMW's implementation differ from traditional residual RL?
Explore multi-modal datasets that include both vehicle telemetry and natural language "driver intent" labels, similar to the Personalized Driving Dataset (PDD).

[CVPR 2024] Drive My Way: Aligning VLAs with Personalized Driving Styles

1. TL;DR

2. The Motivation: Why "One Size Fits All" Fails

3. Methodology: The Architecture of Personality

3.1. 1. Long-term Preference Encoder

3.2. 2. The Residual Decoder Strategy

3.3. 3. Style-Aware Reward Adaptation

4. Experimental Results: Does it actually drive like "Me"?

5. Deep Insights & Future Outlook