WisPaper
WisPaper
学术搜索
学术问答
论文订阅
价格
TrueCite
[CVPR 2025] Spectral Scalpel: Sharpening Action Boundaries via Frequency-Selective Filtering
总结
问题
方法
结果
要点
摘要

This paper introduces Spectral Scalpel, a novel frequency-domain filtering framework for Skeleton-based Temporal Action Segmentation (STAS). It achieves State-of-the-Art (SOTA) performance across five benchmarks (e.g., +4.8% F1@50 on PKU-MMD X-view) by selectively amplifying action-specific frequencies and suppressing shared spectral components.

TL;DR

Skeletal motion is more than just a sequence of coordinates—it's a symphony of joint oscillations. Current models for Skeleton-based Temporal Action Segmentation (STAS) often fail because their temporal aggregators (like TCNs) act as "filters" that smooth out the very differences needed to tell one action from another. Spectral Scalpel fixes this by performing "surgery" in the frequency domain, suppressing shared frequencies between adjacent actions to make transitions crystal clear.

The "Smoothing" Problem: Why SOTA Models Blur Transitions

Standard architectures for action segmentation focus on capturing long-term dependencies. However, these models (Transformers and TCNs) possess an inherent low-pass filtering bias. While great for consistency, this "averaging" logic erases the high-frequency nuances that distinguish the end of a "waving" action from the start of a "clapping" action.

The authors argue that visually similar actions often share a common "low-frequency" base but differ in their unique "high-frequency" signatures. If we can't tell them apart in time, we should look at their vibration patterns.

Methodology: The "Surgical" Toolkit

The paper introduces three core components to move the modeling bottleneck into the spectral space:

1. Multi-scale Adaptive Spectral Filter (MASF)

Acting as the "scalpel," this module transforms spatial features into the frequency domain using FFT. It applies learnable filters across multiple scales to selectively amplify or suppress specific frequency bins.

Overall Architecture

2. Adjacent Action Discrepancy Loss (AADL)

This is the "surgical objective." By maximizing the amplitude spectrum difference between adjacent segments, the model is forced to learn features that are statistically distinct across action boundaries. This directly addresses boundary ambiguity.

3. Frequency-Aware Channel Mixer (FACM)

Instead of mixing channels in the time domain, FACM performs mixing in the spectral space by processing real and imaginary components. This allows for parameter-efficient "channel evolution" that respects the periodic nature of the data.

Core Components Detail

Experimental Results: Precision and Efficiency

Spectral Scalpel was tested on five diverse datasets, including PKU-MMD v2 and MCFS-130.

  • Performance: It achieved a +4.8% F1@50 improvement on PKU-MMD (X-view), a significant jump for this task.
  • Efficiency: Despite the complex math, the logic is lightweight. Using FFT (O(T log T)) makes it faster than many Transformer variants, requiring only 146ms per video for inference.
  • Robustness: The model is remarkably resilient to noise. When 30% of joints are occluded, Spectral Scalpel’s performance drops far less than its predecessors (DeST/LaSA) because the "noise" typically resides in frequency bands that the spectral filters learn to ignore.

Performance Comparison

Deep Insight: Beyond Time-Domain Thinking

The most striking visualization in the paper is the comparison of "Unfiltered" vs "Filtered" frame-wise activations. In the unfiltered version, multiple different actions show nearly identical mean values. Once the "Spectral Scalpel" is applied, the waveforms for different actions (colored segments below) become clearly separated in amplitude and frequency.

Spectral Visual Evidence

Conclusion & Future Outlook

Spectral Scalpel is the first framework to systematically integrate frequency-domain analysis into STAS. It proves that for high-speed, periodic human motions, the Frequency Domain is often more discriminative than the Time Domain.

Limitations: The model still struggles with "quasi-static" actions (like standing still) where there is no frequency to "filter." Future Work: The authors suggest moving toward Time-Frequency Collaborative Analysis (e.g., Wavelets) to handle both static and dynamic actions simultaneously.


Code is available at: https://github.com/HaoyuJi/SpecScalpel

发现相似论文

试试这些示例

  • Search for recent papers in temporal action segmentation that utilize Fourier Transform or wavelet analysis for boundary refinement.
  • What are the seminal works on Multi-scale Adaptive Spectral Filtering in computer vision, and how does this paper adapt those concepts for skeleton-based motion?
  • Find studies that compare the robustness of frequency-domain versus time-domain features in the presence of sensor noise or data occlusions for human activity recognition.
目录
[CVPR 2025] Spectral Scalpel: Sharpening Action Boundaries via Frequency-Selective Filtering
1. TL;DR
2. The "Smoothing" Problem: Why SOTA Models Blur Transitions
3. Methodology: The "Surgical" Toolkit
3.1. 1. Multi-scale Adaptive Spectral Filter (MASF)
3.2. 2. Adjacent Action Discrepancy Loss (AADL)
3.3. 3. Frequency-Aware Channel Mixer (FACM)
4. Experimental Results: Precision and Efficiency
5. Deep Insight: Beyond Time-Domain Thinking
6. Conclusion & Future Outlook