The core trade-off: how much speed can you gain without ruining image quality?
Every acceleration method for diffusion models faces the same fundamental tension: fewer sampling steps (or cheaper steps) means faster generation, but often at the cost of blurrier, noisier, or less coherent outputs. The best methods manage to keep quality nearly intact while cutting inference time dramatically. For instance, distillation approaches like the one from Meng et al. [3] produce images visually comparable to the original model using as few as 4 sampling steps on ImageNet 64×64, achieving a 256× speedup while maintaining FID/IS scores close to the original. Similarly, Flash Diffusion [11] reaches state-of-the-art FID and CLIP-Score for few-step generation on COCO datasets, requiring only a few GPU hours of training. On the other hand, simpler caching methods like DeepCache [8] offer more modest speedups (2.3× for Stable Diffusion v1.5) but with almost no quality degradation (only a 0.05 drop in CLIP Score). The takeaway: if you need extreme speed (e.g., real-time generation), distillation is the way to go; if you need to preserve quality at all costs, caching or better numerical solvers may be safer.
Four main families of acceleration methods — and how they compare
The papers cluster into four distinct strategies. First, distillation methods [3][11] train a smaller or faster model to mimic the original's output in far fewer steps. For example, Meng et al. [3] distilled classifier-free guided diffusion into a model that needs just 1–4 steps, accelerating inference by at least 10× on latent-space models like Stable Diffusion. Second, feature caching methods [1][8] exploit redundancy across denoising steps: DeepCache [8] reuses high-level U-Net features across adjacent steps, achieving 4.1× speedup on LDM-4-G with only a 0.22 FID increase on ImageNet. LESA [1] goes further with a learnable predictor that adapts to different noise levels, yielding 5× acceleration on FLUX.1-dev with just a 1.0% quality drop. Third, better numerical solvers [5][7] improve the discretization of the underlying differential equation. PNDMs [7] treat diffusion as solving ODEs on manifolds and generate higher-quality images in 50 steps than DDIMs in 1000 steps (20× speedup). The timestep tuner [5] adjusts the integral direction for each interval, improving FID from 9.65 to 6.07 on LSUN Bedroom with only 10 steps. Fourth, latent-space methods [6][9] run the diffusion process in a compressed representation. LaDiffuSeq [9] quadruples text generation sampling speed by working in a low-dimensional latent space, while the LDM+Cold Diffusion framework [6] achieves 14× faster sampling for CT denoising by replacing Gaussian noise with a task-specific degradation.
Domain-specific accelerations: MRI and CT imaging get their own tailored solutions
Medical imaging poses unique constraints: reconstruction must be both fast and accurate, and the data has special structure (e.g., k-space in MRI). Several papers propose methods that exploit this structure. For accelerated MRI, HFS-SDE [2] restricts the diffusion process to high-frequency k-space regions, ensuring low-frequency (fully sampled) regions remain deterministic, which accelerates sampling and improves stability. FDMR [4] combines adversarial training with a three-stage inference framework (fast generation, early-stopped adaptation, refinement) to achieve 4–10× speedup over standard diffusion, reconstructing an image in just 8 seconds. SPIRiT-Diffusion [12] designs a custom SDE based on the physics of k-space interpolation, enabling high-quality reconstruction at 10× acceleration. For CT denoising, the LDM+Cold Diffusion framework [6] achieves 14× faster sampling than standard DDPM by working in latent space and using a task-specific degradation instead of Gaussian noise. These domain-specific methods often outperform generic acceleration techniques because they incorporate prior knowledge about the measurement process.
What the evidence leaves unresolved: generalization, training cost, and the 'free lunch' question
Despite impressive results, several open questions remain. First, most methods are demonstrated on specific model architectures (e.g., U-Net, DiT) and datasets (e.g., ImageNet, CelebA, fastMRI). It is unclear how well they generalize to newer architectures like MMDiT or to very large-scale models (e.g., video diffusion). LESA [1] shows generalization across text-to-image and text-to-video models, but this is the exception rather than the rule. Second, the training cost of distillation and learned predictors can be substantial: Flash Diffusion [11] requires 'several GPU hours,' while LESA [1] uses two-stage training. For practitioners with limited compute, training-free methods like DeepCache [8] or PNDMs [7] are more accessible. Third, the 'free lunch' question: can you accelerate without any quality loss? The evidence says no — every method shows some degradation at very high speedups. For example, DeepCache [8] reports a 0.22 FID increase at 4.1× speedup, and LESA [1] a 1.0% quality drop at 5× speedup. The Multilevel Euler-Maruyama method [10] offers a polynomial speedup (up to 4× on CelebA 64×64) but requires training multiple UNets of increasing size, which may not be practical for all users. In short, the best method depends on your specific trade-off priorities: maximum speed, minimum quality loss, or minimum training cost.
Sources used in this answer
LESA: Learnable Stage-Aware Predictors for Diffusion Model Acceleration
LESA uses a learnable stage-aware predictor (KAN-based) to cache features, achieving 5× acceleration on FLUX.1-dev with only 1.0% quality drop and 6.25× on Qwen-Image with 20.2% quality improvement over TaylorSeer.
High-Frequency Space Diffusion Model for Accelerated MRI
HFS-SDE restricts diffusion to high-frequency k-space for accelerated MRI, improving reconstruction accuracy and stability while accelerating sampling.
On Distillation of Guided Diffusion Models
Distillation of classifier-free guided diffusion into a single model enables 1–4 step sampling, achieving up to 256× speedup on pixel-space models and at least 10× on latent-space models like Stable Diffusion.
Fast unconditional diffusion model for accelerated MRI reconstruction.
FDMR combines adversarial training of a denoising diffusion GAN with a three-stage inference framework, achieving 4–10× faster MRI reconstruction (8 seconds per image) with superior accuracy.
Towards More Accurate Diffusion Model Acceleration with a Timestep Tuner
A timestep tuner adjusts the integral direction at each denoising step, improving FID from 9.65 to 6.07 on LSUN Bedroom with only 10 steps when applied to DDIM.
Accelerating Diffusion: Task-Optimized latent diffusion models for rapid CT denoising.
Integrating Latent Diffusion Model with Cold Diffusion Process for CT denoising achieves 2× faster training and 14× faster sampling while outperforming DDPM in PSNR, SSIM, and RMSE.
Pseudo Numerical Methods for Diffusion Models on Manifolds
Pseudo Numerical Methods (PNDMs) treat diffusion as solving ODEs on manifolds, generating higher-quality images in 50 steps than DDIMs in 1000 steps (20× speedup) and outperforming DDIMs with 250 steps by ~0.4 FID.
DeepCache: Accelerating Diffusion Models for Free
DeepCache caches and reuses high-level U-Net features across denoising steps, achieving 2.3× speedup for Stable Diffusion v1.5 with only 0.05 CLIP Score drop and 4.1× for LDM-4-G with 0.22 FID increase.
Utilizing Latent Diffusion Model to Accelerate Sampling Speed and Enhance Text Generation Quality
LaDiffuSeq performs diffusion in a low-dimensional latent space for text generation, quadrupling sampling speed while improving BERTScore by up to 0.105 and reducing perplexity by up to 4.562 on real-world datasets.
Polynomial Speedup in Diffusion Models with the Multilevel Euler-Maruyama Method
The Multilevel Euler-Maruyama method uses UNets of increasing sizes to achieve polynomial speedup (up to 4× on CelebA 64×64) by requiring only a few evaluations of the largest, most accurate UNet.
Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation
Flash Diffusion distills any conditional diffusion model (UNet, DiT, MMDiT) into a few-step generator, achieving state-of-the-art FID and CLIP-Score on COCO with only several GPU hours of training.
SPIRiT-Diffusion: Self-Consistency Driven Diffusion Model for Accelerated MRI.
SPIRiT-Diffusion designs a model-driven SDE based on k-space self-consistency for MRI, enabling high-quality reconstruction at 10× acceleration, outperforming image-domain methods.
