WisPaper
WisPaper
学术搜索
学术问答
论文订阅
价格
TrueCite
[arXiv 2026] DA-Flow: Seeing Through the Noise with Restoration-Aware Diffusion
总结
问题
方法
结果
要点
摘要

DA-Flow is a novel optical flow estimation framework designed for severely corrupted videos. It leverages a "lifted" image restoration diffusion model with spatio-temporal attention to extract degradation-aware features, achieving new SOTA performance on benchmarks like Sintel and Spring under real-world noise and blur.

TL;DR

Optical flow in the "wild" is often a mess of motion blur, sensor noise, and compression artifacts. DA-Flow tackles this by repurposing image restoration diffusion models. By "lifting" these models with spatio-temporal attention, the authors extract features that are both aware of degradations and rich in geometric cues, setting a new bar for flow accuracy in corrupted video sequences.

Problem & Motivation: The "Blindness" of Traditional Flow

Standard optical flow architectures, such as RAFT or SEA-RAFT, rely on clean, high-frequency textures to establish pixel-level correspondences. In real-world scenarios—think low-light surveillance or high-speed dashcam footage—these textures are often obliterated.

The authors argue that this isn't just a data distribution problem; it's an inherently ill-posed inverse problem. When pixels are blurred or noisy, the matching signal disappears. To solve this, a model needs "prior knowledge" of what a clean scene should look like and how specific degradations (like JPEG artifacts) warp that reality.

Methodology: Lifting Diffusion for Temporal Awareness

The core innovation lies in the use of Diffusion Transformers (DiT) trained for image restoration. These models are already experts at understanding corruptions. However, image models lack the "temporal glue" needed for motion.

1. Spatio-Temporal Lifting

Instead of using a heavy video diffusion backbone (which often collapses temporal resolution), the authors take a pretrained DiT4SR (Image Restoration) model and inject Full Spatio-Temporal MM-Attention. This allows tokens in Frame A to attend to all tokens in Frame B, enabling the model to "find" correspondences while maintaining independent spatial latents.

Model Architecture Figure 1: The DA-Flow pipeline, showing the fusion of lifted diffusion features and CNN features.

2. Hybrid Feature Encoding

Diffusion features are powerful but coarse (typically 1/16 resolution). DA-Flow uses a DPT-based upsampler to bring these features back to 1/8 resolution and then concatenates them with local, high-frequency features from a standard CNN encoder. This "Best of Both Worlds" approach provides:

  • Diffusion Branch: Global context and degradation awareness.
  • CNN Branch: Local precision for sharp motion boundaries.

Experiments & SOTA Results

The model was tested on degraded versions of Sintel, Spring, and TartanAir.

| Model | Sintel EPE ↓ | Spring EPE ↓ | TartanAir Outlier (1px) ↓ | | :--- | :--- | :--- | :--- | | RAFT | 10.69 | 3.94 | 75.17% | | SEA-RAFT | 10.18 | 2.70 | 77.85% | | DA-Flow (Ours) | 6.91 | 2.21 | 72.35% |

DA-Flow doesn't just improve the average error; it substantially reduces outliers. Qualitative results show that while baselines produce "noisy" flow fields in blurred regions, DA-Flow maintains clean, sharp boundaries.

Qualitative Comparison Figure 2: Visualizing the difference. DA-Flow (right) recovers coherent motion where baselines (middle) see only noise.

Deep Insights: Why it Works

  • Layer Selection: Not all diffusion layers are equal. The authors found that specific intermediate layers (3, 13, 16, 17) in the MM-DiT block provide the best "correspondence-ready" features.
  • Zero-Shot Capability: Even before training for flow, the lifted restoration features showed inherent matching ability, proving that the restoration task forces the model to learn the underlying scene geometry.

Critical Analysis & Future Work

Limitations: The elephant in the room is inference speed. Because DA-Flow relies on a diffusion denoising process (even with 10 steps), it is significantly slower than purely discriminative models like RAFT.

Future Outlook: The authors suggest that one-step distillation (like LCM or SDXL Turbo) could be the key to making this technology viable for real-time applications. Beyond flow, this "restoration-feature-fusion" concept could likely revolutionize other tasks like depth estimation or tracking in adverse weather conditions.

Conclusion

DA-Flow shifts the paradigm of robust optical flow. Instead of just trying to be "robust" to noise, it uses a generative prior to understand and undo the noise, establishing a new standard for dense correspondence in the real world.

发现相似论文

试试这些示例

  • Search for recent papers using diffusion model intermediate features for dense correspondence or optical flow tasks since 2024.
  • How does the "Rectified Flow" formulation in DiT4SR compare to standard DDPM for image restoration tasks?
  • Investigate techniques for accelerating diffusion-based feature extraction, such as one-step distillation or consistency models, for real-time video processing.
目录
[arXiv 2026] DA-Flow: Seeing Through the Noise with Restoration-Aware Diffusion
1. TL;DR
2. Problem & Motivation: The "Blindness" of Traditional Flow
3. Methodology: Lifting Diffusion for Temporal Awareness
3.1. 1. Spatio-Temporal Lifting
3.2. 2. Hybrid Feature Encoding
4. Experiments & SOTA Results
5. Deep Insights: Why it Works
6. Critical Analysis & Future Work
7. Conclusion