15  Flow Matching

TipKey references
  • Neural ODEs — the continuous-time viewpoint where a neural network parameterizes a vector field and data evolves through an ordinary differential equation (Chen et al., 2018).
  • Continuous normalizing flows — scalable continuous-time generative flows that connect latent variables to data through learned dynamics (Grathwohl et al., 2019).
  • Flow matching — the modern training objective that learns a transport vector field directly from simple reference paths between noise and data (Lipman et al., 2023).

A flow-matching model learns a continuous vector field that transports samples from a simple base distribution, usually Gaussian noise, to the target data distribution. Instead of adding noise and then reversing it as in diffusion models, flow matching learns the velocity field of a probability flow.

This perspective is useful because it turns generative modeling into a transport problem. If we know how samples should move at every time \(t \in [0, 1]\), then generation becomes a matter of integrating an ordinary differential equation from noise to data.

15.1 The basic idea

Suppose \(\mathbf{x}_0 \sim p_0\) is a simple reference sample and \(\mathbf{x}_1 \sim p_1\) is a target data sample. We define an interpolation path between them, for example the straight-line path

\[ \mathbf{x}_t = (1 - t)\,\mathbf{x}_0 + t\,\mathbf{x}_1, \qquad t \in [0, 1]. \]

For this path, the ideal velocity is simply

\[ \mathbf{u}_t = \frac{d\mathbf{x}_t}{dt} = \mathbf{x}_1 - \mathbf{x}_0. \]

Flow matching trains a neural network \(\mathbf{v}_\theta(\mathbf{x}, t)\) to predict that velocity from points sampled along the path. A simple objective is

\[ \mathcal{L}(\theta) = \mathbb{E}_{\mathbf{x}_0,\mathbf{x}_1,t}\left[\left\|\mathbf{v}_\theta(\mathbf{x}_t, t) - \mathbf{u}_t\right\|_2^2\right]. \]

After training, generation solves the ODE

\[ \frac{d\mathbf{x}}{dt} = \mathbf{v}_\theta(\mathbf{x}, t), \qquad \mathbf{x}(0) \sim p_0. \]

The learned dynamics push the base noise distribution toward the data distribution.

15.2 Why flow matching is interesting

Compared with diffusion models, flow matching often gives a cleaner conceptual picture for inverse problems and conditional generation:

  • the model is a deterministic transport map rather than a stochastic reverse Markov chain,
  • sampling can use standard ODE solvers,
  • and the conditioning logic fits naturally with transport from a prior toward an observation-consistent posterior.

In practice, diffusion and flow matching are closely related. Both learn time-dependent transformations from simple noise to complex data. The difference is mostly in whether the learned process is framed as denoising a stochastic corruption or integrating a deterministic flow.

15.3 Code example: transporting Gaussian noise into a bimodal porosity prior

We reuse the same synthetic porosity distribution as in the diffusion chapter, but now train a velocity network directly. The input is a point on the interpolation path together with the time \(t\), and the output is the velocity that should move that point toward the target distribution.

using Lux, Random, Optimisers, Zygote, Statistics, Printf, CairoMakie

rng = Xoshiro(42)

function sample_porosity(rng, n)
    values = zeros(Float32, n)
    for i in 1:n
        if rand(rng) < 0.55f0
            values[i] = clamp(0.27f0 + 0.025f0 * randn(rng, Float32), 0.16f0, 0.36f0)
        else
            values[i] = clamp(0.09f0 + 0.015f0 * randn(rng, Float32), 0.03f0, 0.14f0)
        end
    end
    return reshape(values, 1, :)
end

n_data = 768
x1_data = sample_porosity(rng, n_data)
μ_data = mean(x1_data)
σ_data = std(x1_data)
x1_scaled = (x1_data .- μ_data) ./ σ_data
1×768 Matrix{Float32}:
 -1.16637  0.702224  -1.03697  -0.924098  …  1.36255  -1.12927  -0.992214
# Tiny MLP: point on the path and time -> velocity
velocity_net = Chain(
    Dense(2 => 32, tanh),
    Dense(32 => 32, tanh),
    Dense(32 => 1)
)

ps, st = Lux.setup(rng, velocity_net)
opt_state = Optimisers.setup(Adam(0.005f0), ps)

function flow_matching_loss(ps, x0_batch, x1_batch, t_batch)
    x_t = (1 .- t_batch) .* x0_batch .+ t_batch .* x1_batch
    u_t = x1_batch .- x0_batch
    inputs = vcat(x_t, t_batch)
    v̂, _ = velocity_net(inputs, ps, st)
    return mean((v̂ .- u_t) .^ 2)
end
flow_matching_loss (generic function with 1 method)
batch_size = 128

for epoch in 1:400
    idx = rand(rng, 1:size(x1_scaled, 2), batch_size)
    x1_batch = x1_scaled[:, idx]
    x0_batch = randn(rng, Float32, 1, batch_size)
    t_batch = rand(rng, Float32, 1, batch_size)

    loss, grads = Zygote.withgradient(ps) do p
        flow_matching_loss(p, x0_batch, x1_batch, t_batch)
    end

    opt_state, ps = Optimisers.update(opt_state, ps, grads[1])

    if epoch == 1 || epoch % 100 == 0
        @printf "Epoch %3d  flow-matching loss = %.6f\n" epoch loss
    end
end
Epoch   1  flow-matching loss = 2.732322
Epoch 100  flow-matching loss = 1.153513
Epoch 200  flow-matching loss = 1.656029
Epoch 300  flow-matching loss = 1.500055
Epoch 400  flow-matching loss = 1.196840
# Sample by integrating the learned ODE from t = 0 to t = 1
function velocity_prediction(x, t, ps)
    t_feature = fill(Float32(t), 1, size(x, 2))
    inputs = vcat(x, t_feature)
    v̂, _ = velocity_net(inputs, ps, st)
    return
end

n_samples = 600
x = randn(rng, Float32, 1, n_samples)
n_solver_steps = 60
dt = 1.0f0 / n_solver_steps

for step in 0:n_solver_steps-1
    t = step * dt
    x .+= dt .* velocity_prediction(x, t, ps)
end

generated_porosity = clamp.(x .* σ_data .+ μ_data, 0.0f0, 0.4f0)

fig = Figure(size = (620, 320))
ax1 = Axis(fig[1, 1], title = "Training data",
           xlabel = "Porosity", ylabel = "Count")
hist!(ax1, vec(x1_data), bins = 35, color = (:black, 0.55))

ax2 = Axis(fig[1, 2], title = "Flow-matching samples",
           xlabel = "Porosity", ylabel = "Count")
hist!(ax2, vec(generated_porosity), bins = 35, color = (:seagreen, 0.65))

Label(fig[0, :], "Flow matching: transport from Gaussian noise to porosity prior", fontsize = 16)
fig

The learned histogram should approximate the two porosity modes again, but the sampling mechanism is now different: there is no reverse noise-removal chain. We simply integrate a learned velocity field from noise to data.

15.4 When to use flow matching

Flow matching is especially appealing when:

  • You want a continuous-time transport view of generation.
  • You expect to reuse the model inside conditioning, inversion, or data-assimilation workflows.
  • Deterministic ODE-based sampling is easier to reason about than stochastic reverse diffusion.
  • You care about faster generation, because well-trained flows can often be sampled with fewer solver steps than diffusion models need denoising steps.

The main tradeoff is that the model must learn a good global velocity field. If that field is poor, ODE integration can drift into unrealistic regions of state space.

15.5 Geoscience applications

  • Posterior transport for inverse problems — flow matching is a natural way to move samples from a simple prior toward complex posterior ensembles in ill-posed geophysical inference.
  • Conditional geological generation — deterministic transport maps can generate facies, permeability, or structural models conditioned on sparse control data while preserving multimodality.
  • Fast sampling inside iterative workflows — because flow models can often use relatively few solver steps, they are attractive when generation must be repeated many times inside inversion or uncertainty-propagation loops.
  • Data assimilation and field reconstruction — flow matching provides a clean framework for transporting coarse or noisy fields toward high-resolution, observation-consistent realizations.
  • Bridging latent priors and physics-aware models — the transport viewpoint fits naturally with later scientific machine learning chapters, where priors, constraints, and inverse objectives are combined.
  • Overview — flow matching is newer than GANs and diffusion in geoscience, but it is well aligned with the uncertainty-aware and inverse-problem-driven perspective emphasized in Bergen et al. (2019) and Dramsch (2020).