Geoscientific Machine Learning

Pankaj K Mishra

doi:10.5281/zenodo.19540496

15 Flow Matching

Key references

Neural ODEs — the continuous-time viewpoint where a neural network parameterizes a vector field and data evolves through an ordinary differential equation (Chen et al., 2018).
Continuous normalizing flows — scalable continuous-time generative flows that connect latent variables to data through learned dynamics (Grathwohl et al., 2019).
Flow matching — the modern training objective that learns a transport vector field directly from simple reference paths between noise and data (Lipman et al., 2023).

A flow-matching model learns a continuous vector field that transports samples from a simple base distribution, usually Gaussian noise, to the target data distribution. Instead of adding noise and then reversing it as in diffusion models, flow matching learns the velocity field of a probability flow.

This perspective is useful because it turns generative modeling into a transport problem. If we know how samples should move at every time \(t \in [0, 1]\), then generation becomes a matter of integrating an ordinary differential equation from noise to data.

15.1 The basic idea

Suppose \(\mathbf{x}_0 \sim p_0\) is a simple reference sample and \(\mathbf{x}_1 \sim p_1\) is a target data sample. We define an interpolation path between them, for example the straight-line path

\[ \mathbf{x}_t = (1 - t)\,\mathbf{x}_0 + t\,\mathbf{x}_1, \qquad t \in [0, 1]. \]

For this path, the ideal velocity is simply

\[ \mathbf{u}_t = \frac{d\mathbf{x}_t}{dt} = \mathbf{x}_1 - \mathbf{x}_0. \]

Flow matching trains a neural network \(\mathbf{v}_\theta(\mathbf{x}, t)\) to predict that velocity from points sampled along the path. A simple objective is

\[ \mathcal{L}(\theta) = \mathbb{E}_{\mathbf{x}_0,\mathbf{x}_1,t}\left[\left\|\mathbf{v}_\theta(\mathbf{x}_t, t) - \mathbf{u}_t\right\|_2^2\right]. \]

After training, generation solves the ODE

\[ \frac{d\mathbf{x}}{dt} = \mathbf{v}_\theta(\mathbf{x}, t), \qquad \mathbf{x}(0) \sim p_0. \]

The learned dynamics push the base noise distribution toward the data distribution.

15.2 Why flow matching is interesting

Compared with diffusion models, flow matching often gives a cleaner conceptual picture for inverse problems and conditional generation:

the model is a deterministic transport map rather than a stochastic reverse Markov chain,
sampling can use standard ODE solvers,
and the conditioning logic fits naturally with transport from a prior toward an observation-consistent posterior.

In practice, diffusion and flow matching are closely related. Both learn time-dependent transformations from simple noise to complex data. The difference is mostly in whether the learned process is framed as denoising a stochastic corruption or integrating a deterministic flow.

15.3 Code example: transporting Gaussian noise into a bimodal porosity prior

We reuse the same synthetic porosity distribution as in the diffusion chapter, but now train a velocity network directly. The input is a point on the interpolation path together with the time \(t\), and the output is the velocity that should move that point toward the target distribution.

using Lux, Random, Optimisers, Zygote, Statistics, Printf, CairoMakie

rng = Xoshiro(42)

function sample_porosity(rng, n)
    values = zeros(Float32, n)
    for i in 1:n
        if rand(rng) < 0.55f0
            values[i] = clamp(0.27f0 + 0.025f0 * randn(rng, Float32), 0.16f0, 0.36f0)
        else
            values[i] = clamp(0.09f0 + 0.015f0 * randn(rng, Float32), 0.03f0, 0.14f0)
        end
    end
    return reshape(values, 1, :)
end

n_data = 768
x1_data = sample_porosity(rng, n_data)
μ_data = mean(x1_data)
σ_data = std(x1_data)
x1_scaled = (x1_data .- μ_data) ./ σ_data

1×768 Matrix{Float32}:
 -1.16637  0.702224  -1.03697  -0.924098  …  1.36255  -1.12927  -0.992214

# Tiny MLP: point on the path and time -> velocity
velocity_net = Chain(
    Dense(2 => 32, tanh),
    Dense(32 => 32, tanh),
    Dense(32 => 1)
)

ps, st = Lux.setup(rng, velocity_net)
opt_state = Optimisers.setup(Adam(0.005f0), ps)

function flow_matching_loss(ps, x0_batch, x1_batch, t_batch)
    x_t = (1 .- t_batch) .* x0_batch .+ t_batch .* x1_batch
    u_t = x1_batch .- x0_batch
    inputs = vcat(x_t, t_batch)
    v̂, _ = velocity_net(inputs, ps, st)
    return mean((v̂ .- u_t) .^ 2)
end

flow_matching_loss (generic function with 1 method)

batch_size = 128

for epoch in 1:400
    idx = rand(rng, 1:size(x1_scaled, 2), batch_size)
    x1_batch = x1_scaled[:, idx]
    x0_batch = randn(rng, Float32, 1, batch_size)
    t_batch = rand(rng, Float32, 1, batch_size)

    loss, grads = Zygote.withgradient(ps) do p
        flow_matching_loss(p, x0_batch, x1_batch, t_batch)
    end

    opt_state, ps = Optimisers.update(opt_state, ps, grads[1])

    if epoch == 1 || epoch % 100 == 0
        @printf "Epoch %3d  flow-matching loss = %.6f\n" epoch loss
    end
end

Epoch   1  flow-matching loss = 2.732322
Epoch 100  flow-matching loss = 1.153513
Epoch 200  flow-matching loss = 1.656030
Epoch 300  flow-matching loss = 1.500055
Epoch 400  flow-matching loss = 1.196840

# Sample by integrating the learned ODE from t = 0 to t = 1
function velocity_prediction(x, t, ps)
    t_feature = fill(Float32(t), 1, size(x, 2))
    inputs = vcat(x, t_feature)
    v̂, _ = velocity_net(inputs, ps, st)
    return v̂
end

n_samples = 600
x = randn(rng, Float32, 1, n_samples)
n_solver_steps = 60
dt = 1.0f0 / n_solver_steps

for step in 0:n_solver_steps-1
    t = step * dt
    x .+= dt .* velocity_prediction(x, t, ps)
end

generated_porosity = clamp.(x .* σ_data .+ μ_data, 0.0f0, 0.4f0)

fig = Figure(size = (620, 320))
ax1 = Axis(fig[1, 1], title = "Training data",
           xlabel = "Porosity", ylabel = "Count")
hist!(ax1, vec(x1_data), bins = 35, color = (:black, 0.55))

ax2 = Axis(fig[1, 2], title = "Flow-matching samples",
           xlabel = "Porosity", ylabel = "Count")
hist!(ax2, vec(generated_porosity), bins = 35, color = (:seagreen, 0.65))

Label(fig[0, :], "Flow matching: transport from Gaussian noise to porosity prior", fontsize = 16)
fig

The learned histogram should approximate the two porosity modes again, but the sampling mechanism is now different: there is no reverse noise-removal chain. We simply integrate a learned velocity field from noise to data.

15.4 When to use flow matching

Flow matching is especially appealing when:

You want a continuous-time transport view of generation.
You expect to reuse the model inside conditioning, inversion, or data-assimilation workflows.
Deterministic ODE-based sampling is easier to reason about than stochastic reverse diffusion.
You care about faster generation, because well-trained flows can often be sampled with fewer solver steps than diffusion models need denoising steps.

The main tradeoff is that the model must learn a good global velocity field. If that field is poor, ODE integration can drift into unrealistic regions of state space.

15.5 Geoscience milestones

Flow matching is newer to the geosciences than either GANs or diffusion models, and there is not yet a canonical Earth-science reference. The broader machine-learning-in-geoscience reviews Bergen et al. (2019) and Dramsch (2020) are the closest pointers and frame the uncertainty-aware, inverse-problem-driven perspective into which flow matching is starting to be placed.