WisPaper
WisPaper
学术搜索
学术问答
论文订阅
价格
TrueCite
[CoRL 2024] DexDrummer: Mastering High-Speed, Contact-Rich Robotic Drumming
总结
问题
方法
结果
要点
摘要

DexDrummer is a hierarchical bimanual robotic framework designed for long-horizon, contact-rich drumming tasks. It combines a high-level residual RL policy for trajectory planning with a low-level dexterous policy that utilizes contact-targeted rewards to achieve SOTA performance, reaching an F1 score of 1.0 on real-world song execution.

TL;DR

Researchers from Stanford University have introduced DexDrummer, a hierarchical robotic system that enables a bimanual robot (using Franka arms and Tesollo dexterous hands) to play complex drum sequences. By combining motion planning with residual RL and a specialized contact curriculum, the system masters the triple challenge of in-hand stick stabilization, forceful external impacts, and long-horizon musical coordination.

The Challenge: Why Drumming is a Dexterity "Stress Test"

Most dexterous manipulation tasks focus on a single axis of difficulty:

  • In-hand manipulation: Rotating a cube (OpenAI's Rubik's hand).
  • Tool use: Using a pair of scissors or a drill.
  • Long-horizon: Pick-and-place sequences.

Drumming sits at the intersection of all three. Every strike on a drum pad introduces a massive external force that threatens to dislodge the stick. To play a 40-second song, the robot must constantly readjust its grip (reactive grasping) while moving its arms across a 5-piece drum kit. Prior methods that "fixed" the stick to the hand failed because they couldn't absorb the vibration or adjust for slippage.

Methodology: High-Level Planning meets Low-Level Reflexes

DexDrummer uses a two-tier architecture to solve the exploration problem in Reinforcement Learning (RL).

1. High-Level: Motion Primitives + Residual RL

The system doesn't learn "how to move to the snare" from scratch. Instead, it uses parameterized motion primitives to generate a base trajectory. A residual RL policy then adds small, high-frequency corrections. This allows the robot to handle the dynamic "bounce" of the drumstick that a pure kinematic planner would ignore.

2. Low-Level: Contact-Targeted Rewards

The "secret sauce" lies in how the agent is incentivized to hold the stick:

  • Fulcrum Reward: Encourages the thumb and index finger to pinch the stick's pivot point, mimicking human drummers.
  • Arm Energy Penalty: Penalizes large arm movements to force the "work" into the fingers, leading to higher-speed "wrist/finger" strokes.
  • Contact Curriculum: During the early stages of training, the drum pad's collision is disabled. This allows the RL agent to learn the motion of drumming without being frustrated by the stick constantly bouncing off the surface.

DexDrummer Framework Figure 1: The DexDrummer hierarchy, showing the pipeline from MIDI input to bimanual execution.

Experiments: Performance and Real-World Transfer

The authors tested DexDrummer across six musical genres.

  • Reactive vs. Fixed Grasp: The "Reactive Grasp" (active finger control) was significantly more robust. In "Easy" long-horizon tasks, it achieved an F1 score nearly double that of a fixed-grasp robot, which eventually lost control of the sticks.
  • Finger-Driven vs. Arm-Driven: When pushed to 240 BPM (4 hits per second), arm-driven policies became "clunky" and "dangerous," while the finger-driven policy maintained a tight trajectory with low energy consumption.

Experimental Results Table 1: The reward structure used to balance in-hand stability and task accuracy.

Real-World "Zero-Shot" Transfer

Despite being trained entirely in simulation, the policy transferred to the real world using simple domain randomization. The robot successfully played hits 100% of the time (1.0 F1 score) on seen songs and adapted its grip strength automatically:

  • Cymbal hits: Loose grip (less vibration).
  • Drum pad hits: Firm grip (high rebound).

Real World Rollout Figure 2: Real-world deployment showing the hand adapting its grip based on the surface material.

Conclusion & Critical Insight

The brilliance of DexDrummer isn't just in making a robot play music; it's in the Inductive Bias provided to the RL agent. By forcing "Finger-dominant" control through energy penalties and a "Fulcrum" reward, the researchers successfully replicated human biomechanical advantages in a robotic system.

While the system currently requires songs to be slowed down for complex bimanual coordination, it sets a new benchmark for combining long-horizon planning with fast, contact-rich reflexes.

Takeaway: Future robotic tools shouldn't just be "held"—they must be "manipulated" within the hand to truly master the physics of the task.

发现相似论文

试试这些示例

  • Search for recent papers that utilize residual reinforcement learning for high-speed dynamic tool manipulation in robotics.
  • Which study first introduced the concept of a contact curriculum to prevent global minima in reinforcement learning for contact-rich tasks?
  • Explore how hierarchical RL frameworks like DexDrummer are being adapted for other rhythmic or musical robotic tasks like piano playing or violin performance.
目录
[CoRL 2024] DexDrummer: Mastering High-Speed, Contact-Rich Robotic Drumming
1. TL;DR
2. The Challenge: Why Drumming is a Dexterity "Stress Test"
3. Methodology: High-Level Planning meets Low-Level Reflexes
3.1. 1. High-Level: Motion Primitives + Residual RL
3.2. 2. Low-Level: Contact-Targeted Rewards
4. Experiments: Performance and Real-World Transfer
4.1. Real-World "Zero-Shot" Transfer
5. Conclusion & Critical Insight