FSGNet: A Frequency-Aware and Semantic Guidance Network for Infrared Small Target Detection

WisPaper

学术搜索

学术问答

价格

TrueCite

工作空间

Home

Blog

FSGNet: A Frequency-Aware and Semantic Guidance Network for Infrared Small Target Detection

FSGNet: Mastering the Frequency and Semantic Domains for Infrared Small Target Detection

总结

问题

方法

结果

要点

摘要

FSGNet is a lightweight Infrared Small Target Detection (IRSTD) framework that addresses semantic dilution in U-Net architectures. By integrating frequency-aware filtering and global semantic guidance, it achieves SOTA performance on four major benchmarks, including a 5.07% IoU improvement on IRSTD-1K over strong baselines.

TL;DR

Detection of infrared small targets (IRSTD) is notoriously difficult due to "semantic dilution" in standard U-Nets and extreme background interference. FSGNet breaks this bottleneck by introducing a spatial-frequency hybrid approach. It uses FFT-based filtering to kill background noise and Global Semantic Guidance flows to ensure the decoder never loses track of the target's location, achieving SOTA accuracy with industry-leading computational efficiency.

The "Vanishing Target" Problem

In IRSTD, targets often occupy only a few pixels and lack texture. While U-Net is the industry standard for segmentation, it has two fatal flaws for this specific task:

Semantic Dilution: As features move from the deep "bottleneck" back up to the shallow layers, the precise localization information is often overwhelmed by high-resolution noise.
Harmful Skip Connections: Skip connections are meant to preserve detail, but in infrared images, they often "leak" background clutter that looks identical to targets in the spatial domain.

Methodology: The FSGNet Trifecta

The authors propose a three-pronged defense against noise and localization errors.

1. MIAM: Capturing Directional Geometry

Traditional convolutions are isotropic (circularly symmetric). However, small targets and background edges often have specific orientations. The Multi-directional Interactive Attention Module (MIAM) uses Pinwheel-shaped Convolutions (PConv). By applying asymmetric padding and kernels, it "looks" in multiple directions simultaneously to better distinguish the structural distribution of a target from random sensor noise.

Model Architecture Fig 1: The overall architecture of FSGNet highlighting the MIAM, MFM, and GPM integration.

2. MFM: Filtering in the Frequency Domain

The Multi-scale Frequency-aware Module (MFM) is perhaps the most innovative part of the network. The authors observed that while a target and a background patch might look similar in pixels (spatial domain), their spectral signatures in the frequency domain are distinct.

Process: Features from skip connections are converted via FFT.
Action: Background-like clutter is filtered out in the frequency space.
Result: Only salient target structures are allowed to pass into the decoder.

Frequency Module Fig 2: Detailed view of the MFM showing the FFT/IFFT cascade.

3. GPM & GSGF: The Semantic North Star

To stop the "dilution" of high-level information, the Global Pooling Module (GPM) aggregates context at the deepest layer. Instead of just passing this to the next layer, it creates Global Semantic Guidance Flows (GSGF)—direct highways that inject localization cues into every stage of the upsampling process. This ensures the decoder always has a "semantic anchor."

Performance Benchmarks

FSGNet was tested against 16 SOTA methods (including UIUNet and DNANet).

Accuracy: Achieved an IoU of 72.45% on IRSTD-1K, a significant jump over L2SKNet (67.38%).
Robustness: In the "NoisySIRST" test (adding Gaussian noise), FSGNet maintained a much higher IoU at low SNR (Signal-to-Noise Ratio) levels compared to its peers.
Efficiency: Despite the complexity of FFT, the model logic is extremely lightweight. It achieves the lowest FLOPs among deep learning competitors, making it ideal for edge deployment on drones or satellites.

Results Visualization Fig 3: Qualitative comparison: FSGNet (bottom row) shows much cleaner detections with fewer false alarms (yellow boxes) and missed targets (blue boxes).

Critical Insight: Why it Works

The success of FSGNet lies in its realization that spatial data isn't everything. By moving the "battle" against background clutter into the frequency domain (via MFM) and providing a "top-down" semantic map (via GPM), it effectively solves the signal-to-noise problem that has haunted IRSTD for decades.

Conclusion & Future Work

FSGNet sets a new standard for efficient, high-precision infrared detection. While the current model relies on fixed frequency-domain operations, a future evolution could involve learnable spectral filters that adapt to specific sensor types or atmospheric conditions.

Takeaway for Practitioners: If your small object detector is struggling with "ghosting" or false alarms from complex backgrounds, consider adding a frequency-domain filtering stage to your skip connections.

发现相似论文

试试这些示例

Search for recent infrared small target detection papers that utilize Fast Fourier Transform (FFT) or frequency-domain decoupling to suppress background clutter.
What are the original papers for "pinwheel-shaped convolution" and how does FSGNet modify this structure for multi-directional interactive attention?
Identify research that applies global semantic guidance or "flow-based" skip connection refinement to other low-resolution object detection tasks like medical imaging or remote sensing.

FSGNet: Mastering the Frequency and Semantic Domains for Infrared Small Target Detection

1. TL;DR

2. The "Vanishing Target" Problem

3. Methodology: The FSGNet Trifecta

3.1. 1. MIAM: Capturing Directional Geometry

3.2. 2. MFM: Filtering in the Frequency Domain

3.3. 3. GPM & GSGF: The Semantic North Star

4. Performance Benchmarks

5. Critical Insight: Why it Works

6. Conclusion & Future Work