WisPaper
WisPaper
学术搜索
学术问答
价格
TrueCite
[2026] FLUIDWORLD: Is Self-Attention Necessary for World Models?
总结
问题
方法
结果
要点
摘要

FLUIDWORLD is a novel world model architecture that replaces self-attention and convolutional recurrence with reaction-diffusion partial differential equations (PDEs) as its predictive engine. On the UCF-101 and Moving MNIST benchmarks, it achieves 2x lower reconstruction error and superior multi-step rollout stability compared to parameter-matched Transformer and ConvLSTM baselines.

Executive Summary

TL;DR: FLUIDWORLD challenges the dominance of Transformers in world modeling by replacing self-attention with Reaction-Diffusion PDEs. In a parameter-matched battle (800K params), this physics-grounded substrate outperformed Transformers and ConvLSTMs in rollout stability and spatial structure preservation, achieving 2x lower reconstruction error.

Background: While the industry defaults to $O(N^2)$ Transformers, this work positions itself as an architectural "back-to-basics" movement. It asks whether the laws of physics—specifically diffusion—can provide a more robust inductive bias for predicting the future than generic combinatorics.

Problem: The Fragility of Latent Imagination

Current world models (like DreamerV3 or ISO-JEPA) typically use Transformers to predict the next latent state. This approach has three fatal flaws:

  1. Computational Waste: Quadratic scaling limits resolution.
  2. Lack of Physics: The model must spend capacity learning that "pixels near each other usually move together."
  3. Error Accumulation: In autoregressive rollouts (mental simulation), small errors compound exponentially because the architecture has no inherent mechanism to "smooth out" noise.

Methodology: The BeliefField and Reaction-Diffusion

Instead of a discrete forward pass, FLUIDWORLD defines a BeliefField ($s_t$) that evolves via a continuous-time PDE.

The Core Equation

The state $u$ evolves according to: $$\frac{du}{d au} = D \cdot abla^2 u + R(u) + ext{Memory}$$

  • Diffusion ($D \cdot abla^2 u$): A multi-scale Laplacian (dilations 1, 4, 16) that propagates information spatially at $O(N)$ cost.
  • Reaction ($R(u)$): A position-wise MLP that handles nonlinear transformations.
  • Bio-Mechanisms: It incorporates Synaptic Fatigue and Lateral Inhibition to prevent channel collapse, ensuring the model uses its full representational capacity.

Architecture Overview Figure 1: The FLUIDWORLD pipeline. Note how the BeliefField evolution replaces the standard Transformer block.

Experiments: PDE vs. The World

To prove this wasn't just "parameter hunting," the author compared three models with exactly ~801K parameters on UCF-101.

Results at a Glance

| Metric | FLUIDWORLD (PDE) | Transformer | ConvLSTM | | :--- | :--- | :--- | :--- | | Reconstruction MSE | 0.001 | 0.002 | 0.001 | | Effective Rank | ~20,000 | ~16,500 | ~19,000 | | Rollout Stability | Stable to $h=3$ | Fails at $h=2$ | Fails at $h=2$ |

The most striking discovery was the "Autopoietic Self-Repair." When the latent state was 50% corrupted with noise, the PDE substrate naturally "healed" itself through diffusion.

Self-Repair Comparison Figure 2: Deliberate state corruption at step 5. The PDE smooths the corruption (Step 9-11), whereas traditional models would enter a permanent "hallucination" state.

Critical Insight: The Oscillatory Recovery

On Moving MNIST, FLUIDWORLD displayed a non-monotonic SSIM curve. Usually, model accuracy drops steadily over time. FLUIDWORLD's accuracy dropped and then rose again.

Why? The Laplacian operator acts as a spatial "immune system." It identifies high-frequency prediction errors as "noise" and dissipates them via diffusion, allowing the underlying low-frequency structure (the "object") to re-emerge.

Conclusion & Future Work

FLUIDWORLD proves that Reaction-Diffusion is a viable, parameter-efficient alternative to Attention.

  • Value: It offers $O(N)$ scaling, making it a prime candidate for high-resolution robotics.
  • Limitations: It is currently $5-8 imes$ slower to train due to the iterative nature of PDE integration.
  • Future: The next frontier is action-conditioned planning—directing the "flow" of the BeliefField via agent intentions.

This work suggests that the future of World Models may look less like a database (Attention) and more like a fluid (PDE).

发现相似论文

试试这些示例

  • Search for recent papers that utilize Neural ODEs or PDEs as the latent transition function in Joint Embedding Predictive Architectures (JEPA).
  • Identify the origin of "Hebbian diffusion" in neural networks and how it compares to standard attention-based structural plasticity.
  • Explore research applying reaction-diffusion dynamics to high-resolution 3D medical imaging or volumetric video prediction tasks to verify $O(N)$ scaling benefits.
目录
[2026] FLUIDWORLD: Is Self-Attention Necessary for World Models?
1. Executive Summary
2. Problem: The Fragility of Latent Imagination
3. Methodology: The BeliefField and Reaction-Diffusion
3.1. The Core Equation
4. Experiments: PDE vs. The World
4.1. Results at a Glance
5. Critical Insight: The Oscillatory Recovery
6. Conclusion & Future Work