Conditional Diffusion for 3D CT Volume Reconstruction from 2D X-rays

WisPaper

学术搜索

学术问答

价格

TrueCite

工作空间

Home

Blog

Conditional Diffusion for 3D CT Volume Reconstruction from 2D X-rays

[MICCAI 2025] AXON: Bridging the 2D-to-3D Gap via Multi-Stage Diffusion for Clinical CT Reconstruction

总结

问题

方法

结果

要点

摘要

The paper introduces AXON, a multi-stage diffusion-based framework designed to reconstruct 3D CT volumes from limited 2D X-ray projections. By combining a Brownian Bridge diffusion model with a ControlNet-based refinement and a 3D super-resolution network, AXON achieves state-of-the-art performance on real clinical data, including an 11.9% improvement in PSNR and 11.0% in SSIM.

TL;DR

Existing 3D CT reconstruction from 2D X-rays often fails in clinical practice because of the massive domain gap between synthetic training data and real medical images. AXON (Advanced X-ray to CT-volume Network) introduces a coarse-to-fine diffusion pipeline that explicitly models the transformation from 2D projections to 3D volumes. By leveraging Brownian Bridge priors and conditional refinement, it achieves an 11.9% PSNR boost and produces high-resolution, diagnostic-grade 3D volumes from standard radiographs.

1. The Dimensionality Crisis in Medical Imaging

Computed Tomography (CT) is the gold standard for 3D anatomy, yet it is expensive and exposes patients to high radiation. While X-rays are accessible, they are mere 2D "shadows" where critical depth information is lost.

The technical challenge is two-fold:

Inverse Problem Complexity: Mapping 2D pixels to 3D voxels is mathematically ill-posed (many 3D structures can produce the same 2D projection).
The DRR Trap: Most AI models are trained on Digitally Reconstructed Radiographs (DRRs)—clean, synthetic X-rays. When these models meet real clinical X-rays (with scatter, noise, and bone-soft tissue artifacts), they typically fail.

2. AXON: A Hierarchical Generative Strategy

The authors propose that direct 2D-to-3D mapping is too "heavy" for a single network. Instead, they decompose the task into three logical stages:

Stage I: CoarseDiff (Structural Anchoring)

Rather than starting from pure noise, AXON uses the Brownian Bridge Diffusion Model (BBDM). It constructs a stochastic bridge between the 2D image signals (lifted into 3D space via a global encoder) and the target CT distribution. This stage focuses on getting the "big picture" right—placing organs and bones in their approximate 3D locations.

Stage II: FineDiff (Intensity Refinement)

Once the global structure is established, a frozen 3D diffusion model (acting as a high-capacity anatomical prior) is guided by a ControlNet branch. This branch takes the coarse output from Stage I and "paints in" the high-frequency details: vascular branches, lung parenchyma, and sharp tissue boundaries.

Stage III: 3D Super-Resolution

To circumvent the "VRAM wall" of 3D diffusion, AXON generates at 128³ and uses a specialized SR-Net to upscale to 256³. Unlike naive interpolation, this module maintains volumetric consistency using 3D Transposed Convolutions.

AXON Framework Architecture Figure 1: The multi-stage AXON pipeline showing the transition from 2D X-ray to Coarse 3D, followed by Fine-grained diffusion refinement.

3. Clinical Validation & Performance

The most impressive aspect of this work is its validation on real paired clinical data. Most papers in this field stop at synthetic testing; AXON tests on the LIDC-IDRI dataset and an entirely unseen in-house dataset.

Quantitative Superiority

| Dataset | Method | PSNR (↑) | SSIM (↑) | | :--- | :--- | :--- | :--- | | In-house (Real) | X2CT-GAN (Baseline) | 19.42 | 0.426 | | In-house (Real) | AXON (Ours) | 21.24 | 0.532 |

The results show a 24.9% relative increase in SSIM over the best baseline. This indicates that AXON isn't just generating "realistic-looking" noise; it is capturing the actual anatomical structure of the patient.

Visual Fidelity

Visual comparisons reveal that while GAN-based methods (like X2CT-GAN) suffer from blurriness and checkerboard artifacts, AXON preserves the intricate branching of the pulmonary vasculature and the distinct borders of heart chambers.

Experimental Results Comparison Figure 2: Qualitative comparison showing AXON's ability to reconstruct sharp anatomical boundaries compared to blurred baselines.

4. Why It Works: The Power of Bi-planar Conditioning

The authors demonstrate that while single X-rays are helpful, bi-planar inputs (Frontal + Lateral) significantly reduce depth ambiguity. AXON’s FusionBlock effectively integrates these two views, bumping the PSNR up to 22.03 dB at 256³ resolution. This suggests AXON could be a "drop-in" enhancement for standard radiology rooms equipped with bi-planar X-ray machines.

5. Critical Analysis & Future Outlook

Strengths:

Real-world Focus: Strong emphasis on generalizing to real clinical noise rather than just fitting synthetic DRRs.
Modular Design: The separation of structure (BBDM) and texture (ControlNet) makes the training more stable than end-to-end GANs.

Limitations:

Inference Time: As with all diffusion models, iterative sampling is slower than GAN-based single-pass inference.
Resolution Limits: While 256³ is a great step forward, clinical CTs often reach 512³ or higher; further scaling will require even more efficient latent representations.

The Bottom Line

AXON sets a new benchmark for 3D medical synthesis. By moving away from "black-box" GANs towards structured, multi-stage diffusion, it brings us one step closer to a future where a simple X-ray can provide the diagnostic depth of a full CT scan.

For more details, check the official implementation at GitHub: ai-med/AXON.

发现相似论文

试试这些示例

Search for recent papers that utilize Brownian Bridge Diffusion Models for cross-domain medical image translation beyond X-ray to CT.
How does the "ControlNet" architecture for 3D volumetric data differ from its original 2D implementation in terms of computational efficiency and memory management?
Investigate the latest methods for unsupervised or semi-supervised domain adaptation to bridge the gap between DRRs and real clinical radiographs in 3D reconstruction.

[MICCAI 2025] AXON: Bridging the 2D-to-3D Gap via Multi-Stage Diffusion for Clinical CT Reconstruction

1. TL;DR

2. 1. The Dimensionality Crisis in Medical Imaging

3. 2. AXON: A Hierarchical Generative Strategy

3.1. Stage I: CoarseDiff (Structural Anchoring)

3.2. Stage II: FineDiff (Intensity Refinement)

3.3. Stage III: 3D Super-Resolution

4. 3. Clinical Validation & Performance

4.1. Quantitative Superiority

4.2. Visual Fidelity

5. 4. Why It Works: The Power of Bi-planar Conditioning

6. 5. Critical Analysis & Future Outlook

6.1. The Bottom Line