WisPaper
WisPaper
Search
QA
Pricing
TrueCite

Which papers propose methods to accelerate the generation process in diffusion models?

Researchers accelerate diffusion models via distillation, caching, numerical methods, and latent-space tricks, achieving 4-256x speedups with minimal quality loss.

Direct answer

Researchers have proposed several families of methods to speed up diffusion models: distillation (training a faster student model) [3][11], feature caching (reusing computations across steps) [1][8], better numerical solvers (fewer steps with less error) [5][7], and latent-space diffusion (working in a compressed representation) [6][9]. For example, distillation can produce high-quality images in as few as 1–4 steps instead of hundreds, yielding up to 256× speedup [3], while caching methods like DeepCache achieve 2.3–4.1× acceleration with negligible quality drop [8]. The key trade-off is between speed and image fidelity: most methods sacrifice some quality at very high speedups, but recent work narrows that gap significantly.

12sources cited

This article was generated with WisPaper-powered search and paper analysis.

The core trade-off: how much speed can you gain without ruining image quality?

Every acceleration method for diffusion models faces the same fundamental tension: fewer sampling steps (or cheaper steps) means faster generation, but often at the cost of blurrier, noisier, or less coherent outputs. The best methods manage to keep quality nearly intact while cutting inference time dramatically. For instance, distillation approaches like the one from Meng et al. [3] produce images visually comparable to the original model using as few as 4 sampling steps on ImageNet 64×64, achieving a 256× speedup while maintaining FID/IS scores close to the original. Similarly, Flash Diffusion [11] reaches state-of-the-art FID and CLIP-Score for few-step generation on COCO datasets, requiring only a few GPU hours of training. On the other hand, simpler caching methods like DeepCache [8] offer more modest speedups (2.3× for Stable Diffusion v1.5) but with almost no quality degradation (only a 0.05 drop in CLIP Score). The takeaway: if you need extreme speed (e.g., real-time generation), distillation is the way to go; if you need to preserve quality at all costs, caching or better numerical solvers may be safer.

Four main families of acceleration methods — and how they compare

The papers cluster into four distinct strategies. First, distillation methods [3][11] train a smaller or faster model to mimic the original's output in far fewer steps. For example, Meng et al. [3] distilled classifier-free guided diffusion into a model that needs just 1–4 steps, accelerating inference by at least 10× on latent-space models like Stable Diffusion. Second, feature caching methods [1][8] exploit redundancy across denoising steps: DeepCache [8] reuses high-level U-Net features across adjacent steps, achieving 4.1× speedup on LDM-4-G with only a 0.22 FID increase on ImageNet. LESA [1] goes further with a learnable predictor that adapts to different noise levels, yielding 5× acceleration on FLUX.1-dev with just a 1.0% quality drop. Third, better numerical solvers [5][7] improve the discretization of the underlying differential equation. PNDMs [7] treat diffusion as solving ODEs on manifolds and generate higher-quality images in 50 steps than DDIMs in 1000 steps (20× speedup). The timestep tuner [5] adjusts the integral direction for each interval, improving FID from 9.65 to 6.07 on LSUN Bedroom with only 10 steps. Fourth, latent-space methods [6][9] run the diffusion process in a compressed representation. LaDiffuSeq [9] quadruples text generation sampling speed by working in a low-dimensional latent space, while the LDM+Cold Diffusion framework [6] achieves 14× faster sampling for CT denoising by replacing Gaussian noise with a task-specific degradation.

Domain-specific accelerations: MRI and CT imaging get their own tailored solutions

Medical imaging poses unique constraints: reconstruction must be both fast and accurate, and the data has special structure (e.g., k-space in MRI). Several papers propose methods that exploit this structure. For accelerated MRI, HFS-SDE [2] restricts the diffusion process to high-frequency k-space regions, ensuring low-frequency (fully sampled) regions remain deterministic, which accelerates sampling and improves stability. FDMR [4] combines adversarial training with a three-stage inference framework (fast generation, early-stopped adaptation, refinement) to achieve 4–10× speedup over standard diffusion, reconstructing an image in just 8 seconds. SPIRiT-Diffusion [12] designs a custom SDE based on the physics of k-space interpolation, enabling high-quality reconstruction at 10× acceleration. For CT denoising, the LDM+Cold Diffusion framework [6] achieves 14× faster sampling than standard DDPM by working in latent space and using a task-specific degradation instead of Gaussian noise. These domain-specific methods often outperform generic acceleration techniques because they incorporate prior knowledge about the measurement process.

What the evidence leaves unresolved: generalization, training cost, and the 'free lunch' question

Despite impressive results, several open questions remain. First, most methods are demonstrated on specific model architectures (e.g., U-Net, DiT) and datasets (e.g., ImageNet, CelebA, fastMRI). It is unclear how well they generalize to newer architectures like MMDiT or to very large-scale models (e.g., video diffusion). LESA [1] shows generalization across text-to-image and text-to-video models, but this is the exception rather than the rule. Second, the training cost of distillation and learned predictors can be substantial: Flash Diffusion [11] requires 'several GPU hours,' while LESA [1] uses two-stage training. For practitioners with limited compute, training-free methods like DeepCache [8] or PNDMs [7] are more accessible. Third, the 'free lunch' question: can you accelerate without any quality loss? The evidence says no — every method shows some degradation at very high speedups. For example, DeepCache [8] reports a 0.22 FID increase at 4.1× speedup, and LESA [1] a 1.0% quality drop at 5× speedup. The Multilevel Euler-Maruyama method [10] offers a polynomial speedup (up to 4× on CelebA 64×64) but requires training multiple UNets of increasing size, which may not be practical for all users. In short, the best method depends on your specific trade-off priorities: maximum speed, minimum quality loss, or minimum training cost.

Sources used in this answer

1

LESA: Learnable Stage-Aware Predictors for Diffusion Model Acceleration

LESA uses a learnable stage-aware predictor (KAN-based) to cache features, achieving 5× acceleration on FLUX.1-dev with only 1.0% quality drop and 6.25× on Qwen-Image with 20.2% quality improvement over TaylorSeer.

2

High-Frequency Space Diffusion Model for Accelerated MRI

HFS-SDE restricts diffusion to high-frequency k-space for accelerated MRI, improving reconstruction accuracy and stability while accelerating sampling.

3

On Distillation of Guided Diffusion Models

Distillation of classifier-free guided diffusion into a single model enables 1–4 step sampling, achieving up to 256× speedup on pixel-space models and at least 10× on latent-space models like Stable Diffusion.

4

Fast unconditional diffusion model for accelerated MRI reconstruction.

FDMR combines adversarial training of a denoising diffusion GAN with a three-stage inference framework, achieving 4–10× faster MRI reconstruction (8 seconds per image) with superior accuracy.

5

Towards More Accurate Diffusion Model Acceleration with a Timestep Tuner

A timestep tuner adjusts the integral direction at each denoising step, improving FID from 9.65 to 6.07 on LSUN Bedroom with only 10 steps when applied to DDIM.

6

Accelerating Diffusion: Task-Optimized latent diffusion models for rapid CT denoising.

Integrating Latent Diffusion Model with Cold Diffusion Process for CT denoising achieves 2× faster training and 14× faster sampling while outperforming DDPM in PSNR, SSIM, and RMSE.

7

Pseudo Numerical Methods for Diffusion Models on Manifolds

Pseudo Numerical Methods (PNDMs) treat diffusion as solving ODEs on manifolds, generating higher-quality images in 50 steps than DDIMs in 1000 steps (20× speedup) and outperforming DDIMs with 250 steps by ~0.4 FID.

8

DeepCache: Accelerating Diffusion Models for Free

DeepCache caches and reuses high-level U-Net features across denoising steps, achieving 2.3× speedup for Stable Diffusion v1.5 with only 0.05 CLIP Score drop and 4.1× for LDM-4-G with 0.22 FID increase.

9

Utilizing Latent Diffusion Model to Accelerate Sampling Speed and Enhance Text Generation Quality

LaDiffuSeq performs diffusion in a low-dimensional latent space for text generation, quadrupling sampling speed while improving BERTScore by up to 0.105 and reducing perplexity by up to 4.562 on real-world datasets.

10

Polynomial Speedup in Diffusion Models with the Multilevel Euler-Maruyama Method

The Multilevel Euler-Maruyama method uses UNets of increasing sizes to achieve polynomial speedup (up to 4× on CelebA 64×64) by requiring only a few evaluations of the largest, most accurate UNet.

11

Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

Flash Diffusion distills any conditional diffusion model (UNet, DiT, MMDiT) into a few-step generator, achieving state-of-the-art FID and CLIP-Score on COCO with only several GPU hours of training.

12

SPIRiT-Diffusion: Self-Consistency Driven Diffusion Model for Accelerated MRI.

SPIRiT-Diffusion designs a model-driven SDE based on k-space self-consistency for MRI, enabling high-quality reconstruction at 10× acceleration, outperforming image-domain methods.