Thermodynamic diffusion models — the early formulation that introduced gradual noising and learned reversal as a generative strategy (Sohl-Dickstein et al., 2015).
DDPM — the paper that made diffusion models practical and widely adopted by training a network to predict injected noise (Ho et al., 2020).
Score-based diffusion — the stochastic-differential-equation view that connects diffusion models, score matching, and continuous-time sampling (Song et al., 2021).
A diffusion model learns to generate data by reversing a gradual corruption process. We begin with a clean sample \(\mathbf{x}_0\), add small amounts of Gaussian noise over many steps until the sample becomes almost pure noise, and then train a neural network to undo that corruption one step at a time.
Conceptually, diffusion models are attractive because they replace one hard generative jump with many easy denoising subproblems. Instead of asking a network to directly map random noise to a realistic geological or physical sample, we ask it a sequence of simpler questions: “given a slightly corrupted sample, what noise was added?” or equivalently “how should this point move to become a little cleaner?”
14.1 The forward noising process
In a discrete diffusion model, the forward process defines a Markov chain:
where \(\beta_t\) is a small variance schedule. Repeating this many times produces progressively noisier states \(\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_T\).
with \(\alpha_t = 1 - \beta_t\) and \(\bar{\alpha}_t = \prod_{s=1}^t \alpha_s\). This says that every noisy sample is just a weighted combination of the clean sample and Gaussian noise.
14.2 The learned reverse process
The reverse process is parameterized by a neural network, often written as \(\epsilon_\theta(\mathbf{x}_t, t)\), that predicts the noise contained in a noisy sample. Training minimizes a simple mean-squared error:
This objective is one of the main reasons diffusion models are stable: they reduce generative modeling to supervised regression on synthetic noise-corruption pairs.
At sampling time, we start from Gaussian noise and repeatedly apply the learned denoising update from \(t=T\) back to \(t=1\). Each reverse step is small, but together they transform noise into a sample drawn from the learned distribution.
14.3 Code example: learning a 1D porosity prior with a diffusion model
We use a tiny diffusion model to learn a bimodal 1D porosity distribution representing two simplified rock populations. The point of the example is not image generation; it is to show the core workflow on a distribution that is easy to visualize. The network input is a noisy porosity value together with the diffusion time step, and the output is the predicted injected noise.
The generated histogram should reproduce the two main porosity modes, even if it is imperfect. That is enough to make the core point: diffusion models learn a distribution by repeatedly solving small denoising problems rather than a single direct generation problem.
14.4 When to use diffusion models
Diffusion models are especially attractive when:
You care about sample quality and distributional realism more than single-shot speed.
The target distribution is multimodal, so a single deterministic prediction would be misleading.
You want a learned prior that can later be conditioned on observations in an inverse problem.
Training stability matters more than the fastest possible sampling.
Their main drawback is sampling cost: a diffusion model typically needs many denoising steps to generate one sample. That is often acceptable for offline uncertainty quantification, but it can be limiting inside large inverse loops unless acceleration tricks are used.
14.5 Geoscience applications
Seismic interpolation and denoising — diffusion models are well matched to reconstructing missing or corrupted wavefield content because they learn realistic structure rather than only pointwise averages.
Learned priors for inverse problems — diffusion sampling can generate ensembles of geologically plausible subsurface models that are later conditioned on travel-time, waveform, or reservoir observations.
Geomodel and facies generation — diffusion models can represent multimodal geological uncertainty for channels, facies maps, and permeability fields without collapsing to a single realization.
Remote sensing and Earth observation — cloud removal, gap filling, downscaling, and super-resolution are natural conditional-generation tasks for diffusion models on spatial fields.
Uncertainty-aware surrogate workflows — because diffusion models generate ensembles rather than point estimates, they are attractive wherever geoscience decisions need uncertainty bands rather than one best model.
Overview — these applications fit the broader shift toward probabilistic, data-driven geoscience modeling reviewed in Bergen et al. (2019) and Dramsch (2020).
Bergen, K. J., Johnson, P. A., Hoop, M. V. de, & Beroza, G. C. (2019). Machine learning for data-driven discovery in solid earth geoscience. Science, 363(6433). https://doi.org/10.1126/science.aau0323
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 6840–6851.
Sohl-Dickstein, J., Weiss, E. A., Maheswaranathan, N., & Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. Proceedings of the 32nd International Conference on Machine Learning (ICML), 2256–2265.
Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., & Poole, B. (2021). Score-based generative modeling through stochastic differential equations. Proceedings of the International Conference on Learning Representations (ICLR).