Which papers propose methods to accelerate the generation process in diffusion models?

The core trade-off: how much speed can you gain without ruining image quality?

Every acceleration method for diffusion models faces the same fundamental tension: fewer sampling steps (or cheaper steps) means faster generation, but often at the cost of blurrier, noisier, or less coherent outputs. The best methods manage to keep quality nearly intact while cutting inference time dramatically. For instance, distillation approaches like the one from Meng et al. [3] produce images visually comparable to the original model using as few as 4 sampling steps on ImageNet 64×64, achieving a 256× speedup while maintaining FID/IS scores close to the original. Similarly, Flash Diffusion [11] reaches state-of-the-art FID and CLIP-Score for few-step generation on COCO datasets, requiring only a few GPU hours of training. On the other hand, simpler caching methods like DeepCache [8] offer more modest speedups (2.3× for Stable Diffusion v1.5) but with almost no quality degradation (only a 0.05 drop in CLIP Score). The takeaway: if you need extreme speed (e.g., real-time generation), distillation is the way to go; if you need to preserve quality at all costs, caching or better numerical solvers may be safer.

Four main families of acceleration methods — and how they compare

The papers cluster into four distinct strategies. First, distillation methods [3][11] train a smaller or faster model to mimic the original's output in far fewer steps. For example, Meng et al. [3] distilled classifier-free guided diffusion into a model that needs just 1–4 steps, accelerating inference by at least 10× on latent-space models like Stable Diffusion. Second, feature caching methods [1][8] exploit redundancy across denoising steps: DeepCache [8] reuses high-level U-Net features across adjacent steps, achieving 4.1× speedup on LDM-4-G with only a 0.22 FID increase on ImageNet. LESA [1] goes further with a learnable predictor that adapts to different noise levels, yielding 5× acceleration on FLUX.1-dev with just a 1.0% quality drop. Third, better numerical solvers [5][7] improve the discretization of the underlying differential equation. PNDMs [7] treat diffusion as solving ODEs on manifolds and generate higher-quality images in 50 steps than DDIMs in 1000 steps (20× speedup). The timestep tuner [5] adjusts the integral direction for each interval, improving FID from 9.65 to 6.07 on LSUN Bedroom with only 10 steps. Fourth, latent-space methods [6][9] run the diffusion process in a compressed representation. LaDiffuSeq [9] quadruples text generation sampling speed by working in a low-dimensional latent space, while the LDM+Cold Diffusion framework [6] achieves 14× faster sampling for CT denoising by replacing Gaussian noise with a task-specific degradation.

Domain-specific accelerations: MRI and CT imaging get their own tailored solutions

Medical imaging poses unique constraints: reconstruction must be both fast and accurate, and the data has special structure (e.g., k-space in MRI). Several papers propose methods that exploit this structure. For accelerated MRI, HFS-SDE [2] restricts the diffusion process to high-frequency k-space regions, ensuring low-frequency (fully sampled) regions remain deterministic, which accelerates sampling and improves stability. FDMR [4] combines adversarial training with a three-stage inference framework (fast generation, early-stopped adaptation, refinement) to achieve 4–10× speedup over standard diffusion, reconstructing an image in just 8 seconds. SPIRiT-Diffusion [12] designs a custom SDE based on the physics of k-space interpolation, enabling high-quality reconstruction at 10× acceleration. For CT denoising, the LDM+Cold Diffusion framework [6] achieves 14× faster sampling than standard DDPM by working in latent space and using a task-specific degradation instead of Gaussian noise. These domain-specific methods often outperform generic acceleration techniques because they incorporate prior knowledge about the measurement process.

What the evidence leaves unresolved: generalization, training cost, and the 'free lunch' question

Despite impressive results, several open questions remain. First, most methods are demonstrated on specific model architectures (e.g., U-Net, DiT) and datasets (e.g., ImageNet, CelebA, fastMRI). It is unclear how well they generalize to newer architectures like MMDiT or to very large-scale models (e.g., video diffusion). LESA [1] shows generalization across text-to-image and text-to-video models, but this is the exception rather than the rule. Second, the training cost of distillation and learned predictors can be substantial: Flash Diffusion [11] requires 'several GPU hours,' while LESA [1] uses two-stage training. For practitioners with limited compute, training-free methods like DeepCache [8] or PNDMs [7] are more accessible. Third, the 'free lunch' question: can you accelerate without any quality loss? The evidence says no — every method shows some degradation at very high speedups. For example, DeepCache [8] reports a 0.22 FID increase at 4.1× speedup, and LESA [1] a 1.0% quality drop at 5× speedup. The Multilevel Euler-Maruyama method [10] offers a polynomial speedup (up to 4× on CelebA 64×64) but requires training multiple UNets of increasing size, which may not be practical for all users. In short, the best method depends on your specific trade-off priorities: maximum speed, minimum quality loss, or minimum training cost.

Sources used in this answer

LESA: Learnable Stage-Aware Predictors for Diffusion Model Acceleration

LESA uses a learnable stage-aware predictor (KAN-based) to cache features, achieving 5× acceleration on FLUX.1-dev with only 1.0% quality drop and 6.25× on Qwen-Image with 20.2% quality improvement over TaylorSeer.

2026 · Peiliang Cai, Jiacheng Liu, Haowen Xu, Xinyu Wang, Chang Zou, Linfeng Zhang · arXiv (Cornell University)

WisPaper

Original

High-Frequency Space Diffusion Model for Accelerated MRI

HFS-SDE restricts diffusion to high-frequency k-space for accelerated MRI, improving reconstruction accuracy and stability while accelerating sampling.

2024 · Chentao Cao, Zhuo-Xu Cui, Yue Wang, Shaonan Liu, Taijin Chen, Hairong Zheng, Dong Liang, Yanjie Zhu · IEEE transactions on medical imaging

Original

On Distillation of Guided Diffusion Models

Distillation of classifier-free guided diffusion into a single model enables 1–4 step sampling, achieving up to 256× speedup on pixel-space models and at least 10× on latent-space models like Stable Diffusion.

2023 · Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik P. Kingma, Stefano Ermon, Jonathan Ho, Tim Salimans · CVPR

Original

Fast unconditional diffusion model for accelerated MRI reconstruction.

FDMR combines adversarial training of a denoising diffusion GAN with a three-stage inference framework, achieving 4–10× faster MRI reconstruction (8 seconds per image) with superior accuracy.

2025 · Guijiao Zhao, Chen Zhou, Jianxing Liu, Yue Hu, Peng Li · Magnetic resonance imaging

Original

Towards More Accurate Diffusion Model Acceleration with a Timestep Tuner

A timestep tuner adjusts the integral direction at each denoising step, improving FID from 9.65 to 6.07 on LSUN Bedroom with only 10 steps when applied to DDIM.

2024 · Mengfei Xia, Yujun Shen, Changsong Lei, Yu Zhou, Deli Zhao, Ran Yi, Wenping Wang, Yong-Jin Liu · CVPR

Original

Accelerating Diffusion: Task-Optimized latent diffusion models for rapid CT denoising.

Integrating Latent Diffusion Model with Cold Diffusion Process for CT denoising achieves 2× faster training and 14× faster sampling while outperforming DDPM in PSNR, SSIM, and RMSE.

2025 · Jongmin Jee, Won Chang, Euyoung Kim, Kyongjoon Lee · Computers in biology and medicine

Original

Pseudo Numerical Methods for Diffusion Models on Manifolds

Pseudo Numerical Methods (PNDMs) treat diffusion as solving ODEs on manifolds, generating higher-quality images in 50 steps than DDIMs in 1000 steps (20× speedup) and outperforming DDIMs with 250 steps by ~0.4 FID.

2022 · Luping Liu, Yi Ren, Zhijie Lin, Zhou Zhao · International Conference on Learning Representations

Original

DeepCache: Accelerating Diffusion Models for Free

DeepCache caches and reuses high-level U-Net features across denoising steps, achieving 2.3× speedup for Stable Diffusion v1.5 with only 0.05 CLIP Score drop and 4.1× for LDM-4-G with 0.22 FID increase.

2024 · Xinyin Ma, Gongfan Fang, Xinchao Wang · CVPR

Original

Utilizing Latent Diffusion Model to Accelerate Sampling Speed and Enhance Text Generation Quality

LaDiffuSeq performs diffusion in a low-dimensional latent space for text generation, quadrupling sampling speed while improving BERTScore by up to 0.105 and reducing perplexity by up to 4.562 on real-world datasets.

2024 · Chenyang Li, Long Zhang, Qiusheng Zheng · Electronics

Original

Polynomial Speedup in Diffusion Models with the Multilevel Euler-Maruyama Method

The Multilevel Euler-Maruyama method uses UNets of increasing sizes to achieve polynomial speedup (up to 4× on CelebA 64×64) by requiring only a few evaluations of the largest, most accurate UNet.

2026 · Arthur Jacot · arXiv (Cornell University)

WisPaper

Original

Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

Flash Diffusion distills any conditional diffusion model (UNet, DiT, MMDiT) into a few-step generator, achieving state-of-the-art FID and CLIP-Score on COCO with only several GPU hours of training.

2025 · Clément Chadebec, Onur Tasar, Eyal Benaroche, Benjamin Aubin · AAAI

Original

SPIRiT-Diffusion: Self-Consistency Driven Diffusion Model for Accelerated MRI.

SPIRiT-Diffusion designs a model-driven SDE based on k-space self-consistency for MRI, enabling high-quality reconstruction at 10× acceleration, outperforming image-domain methods.

2025 · Zhuo-Xu Cui, Chentao Cao, Yue Wang, Sen Jia, Jing Cheng, Xin Liu, Hairong Zheng, Dong Liang, Yanjie Zhu · IEEE transactions on medical imaging

Original

Which papers propose methods to accelerate the generation process in diffusion models?

The core trade-off: how much speed can you gain without ruining image quality?

Four main families of acceleration methods — and how they compare

Domain-specific accelerations: MRI and CT imaging get their own tailored solutions

What the evidence leaves unresolved: generalization, training cost, and the 'free lunch' question

Sources used in this answer

LESA: Learnable Stage-Aware Predictors for Diffusion Model Acceleration

High-Frequency Space Diffusion Model for Accelerated MRI

On Distillation of Guided Diffusion Models

Fast unconditional diffusion model for accelerated MRI reconstruction.

Towards More Accurate Diffusion Model Acceleration with a Timestep Tuner

Accelerating Diffusion: Task-Optimized latent diffusion models for rapid CT denoising.

Pseudo Numerical Methods for Diffusion Models on Manifolds

DeepCache: Accelerating Diffusion Models for Free

Utilizing Latent Diffusion Model to Accelerate Sampling Speed and Enhance Text Generation Quality

Polynomial Speedup in Diffusion Models with the Multilevel Euler-Maruyama Method

Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

SPIRiT-Diffusion: Self-Consistency Driven Diffusion Model for Accelerated MRI.

Related research questions