WisPaper
WisPaper
学术搜索
学术问答
价格
TrueCite
Anti-I2V: Weaponizing Internal Representation Collapse to Stop Malicious Deepfakes
总结
问题
方法
结果
要点
摘要

Anti-I2V is a novel defense framework designed to safeguard personal photos from malicious human image-to-video (I2V) generation. It introduces a Dual-Space Perturbation (DSP) strategy across Lab* and frequency domains, achieving state-of-the-art disruption of identity preservation and temporal coherence across UNet and Diffusion Transformer (DiT) architectures.

TL;DR

As Image-to-Video (I2V) models like CogVideoX and OpenSora become more powerful, the risk of unauthorized "human animation"—creating fake videos from a single photo—skyrockets. Anti-I2V is a new defense mechanism that injects imperceptible adversarial noise into your photos. Unlike previous methods, it doesn't just "mess up the pixels"; it targets the deep semantic layers of Diffusion Transformers (DiTs), causing the video generation process to "collapse" internally, resulting in garbled, unrecognizable videos.

Motivation: Why DiTs are Harder to Guard

Most current adversarial defenses (like AdvDM or MIST) were built for U-Net-based text-to-image models (like Stable Diffusion v1.5). Video Generation Models (VGMs) pose two new challenges:

  1. Temporal Consistency: These models are designed to be extremely robust; even if one frame is slightly "off," the attention mechanism "corrects" it using information from surrounding frames.
  2. Architecture Shift: The industry is moving from U-Nets to Diffusion Transformers (DiTs). DiTs have a much larger capacity and use advanced attention mechanisms that tend to ignore traditional pixel-level noise.

Overall Architecture

Methodology: Targeting the Heart of the Model

1. Dual-Space Perturbation (DSP)

Instead of optimizing noise in the standard RGB space, Anti-I2V operates in:

  • Lab Color Space*: Specifically perturbing the a* and b* channels to disrupt color perception without affecting brightness, making the noise invisible to humans but confusing to models.
  • Frequency Domain (DCT): By targeting low-frequency components via Discrete Cosine Transform, the attack hits the "structural" core of the image.

2. Internal Representation Collapse (IRC)

The authors discovered that in models like OpenSora and CogVideoX, meaningful semantic features only start to emerge after specific layers (e.g., the 19th or 27th transformer block).

The IRC Loss forces these "deep semantic layers" to look like "shallow structural layers" during the denoising process. By collapsing these high-level representations, the model loses its ability to reconstruct human identity.

Layer Visualization

Experiments: Breaking the "Fake" Videos

The team tested Anti-I2V on CelebV-Text (facial focused) and UCF101 (action focused) datasets using three SOTA backbones: CogVideoX, OpenSora, and DynamiCrafter.

Quantitative Edge

Anti-I2V achieved the lowest Identity Score Matching (ISM) and CLIP-FIQA scores across the board. In simple terms, while other generic attacks caused minor blurring, Anti-I2V effectively "deleted" the person's identity from the generated video.

Qualitative Edge

As seen in the visual results, baseline methods like SDS or VGMShield often leave the human face recognizable. Anti-I2V produces pronounced artifacts and total semantic divergence, rendering the "malicious" video useless.

Visual Results

Critical Insight & Conclusion

The true value of this work lies in the layer-wise analysis. By proving that we can degrade generation fidelity by "anchoring" or "collapsing" specific internal representations, the authors have opened a new front in AI safety.

Limitations: There is an inherent trade-off. Extreme protection can sometimes lead to visible noise if the budget ($\Delta$) is too high. However, by using DSP, Anti-I2V keeps the PSNR and SSIM competitive while maintaining an aggressive defense. For individuals concerned about their digital likeness being used to train or generate fake media, Anti-I2V represents a critical step toward "proactive" privacy.

发现相似论文

试试这些示例

  • Search for recent adversarial defense papers published after 2024 specifically targeting Diffusion Transformer (DiT) architectures in video generation.
  • Investigate the original theoretical framework of 'Adversarial Purification' and how modern defenses like Anti-I2V specifically bypass these purification mechanisms.
  • Explore if latent-space internal representation collapse (IRC) has been successfully applied to protect medical or industrial sensitive images from generative misuse.
目录
Anti-I2V: Weaponizing Internal Representation Collapse to Stop Malicious Deepfakes
1. TL;DR
2. Motivation: Why DiTs are Harder to Guard
3. Methodology: Targeting the Heart of the Model
3.1. 1. Dual-Space Perturbation (DSP)
3.2. 2. Internal Representation Collapse (IRC)
4. Experiments: Breaking the "Fake" Videos
4.1. Quantitative Edge
4.2. Qualitative Edge
5. Critical Insight & Conclusion