Geoscientific Machine Learning

Pankaj K Mishra

doi:10.5281/zenodo.19540496

18 DeepONet

Key references

DeepONet — the foundational neural operator based on the universal approximation theorem for operators, using a branch-trunk architecture (Lu, Jin, et al., 2021).
Universal approximation of operators — the mathematical theorem guaranteeing that a two-sub-network architecture can approximate any continuous nonlinear operator (Chen & Chen, 1995).
DeepXDE — a library for PINNs and DeepONets that formalized many practical training strategies (Lu, Meng, et al., 2021).
Neural operator survey — a comprehensive mathematical treatment of neural operators and their approximation properties (Kovachki et al., 2023).
Neural operators for science — review of neural operators accelerating scientific simulations (Azizzadenesheli et al., 2024).

In the PINN chapter we trained a network to solve one instance of a PDE. Change the boundary conditions, initial conditions, or source term, and you must retrain from scratch. DeepONet (Deep Operator Network) solves a fundamentally more ambitious problem: it learns the solution operator — the mapping from input functions to output functions — so that after training, it can instantly predict solutions for any new input without retraining.

18.1 The operator learning problem

In many PDE problems, the solution \(u\) depends on an input function \(a\) (a forcing term, initial condition, boundary condition, or material property):

\[ \mathcal{N}[u; a] = 0 \quad \implies \quad u = \mathcal{G}(a) \]

where \(a \in \mathcal{A}\) is an input function, \(u \in \mathcal{U}\) is the corresponding output function, and \(\mathcal{G}: \mathcal{A} \to \mathcal{U}\) is the solution operator. Writing \(\mathcal{G}(a)(y)\) means: first apply the operator to the whole input function \(a\), then evaluate the resulting output function at the query location \(y\). A traditional solver computes \(\mathcal{G}(a)\) one \(a\) at a time. DeepONet learns \(\mathcal{G}_\theta \approx \mathcal{G}\) from data — pairs \(\{(a_i, u_i)\}_{i=1}^{N}\) — so that inference on new inputs is a single forward pass.

18.2 The universal approximation theorem for operators

Chen & Chen (1995) proved that a network with two sub-networks can approximate any continuous nonlinear operator to arbitrary accuracy. This theorem provides the mathematical foundation for DeepONet.

Theorem (informal): For any continuous operator \(\mathcal{G}: \mathcal{A} \to \mathcal{U}\) and any \(\epsilon > 0\), there exist neural networks (branch and trunk) such that:

\[ \bigl|\mathcal{G}(a)(y) - \mathcal{G}_\theta(a)(y)\bigr| < \epsilon \]

for all input functions \(a \in \mathcal{A}\) and all query locations \(y\).

18.3 Architecture: branch and trunk

DeepONet decomposes the operator approximation into two learnable components:

Input function a(x)                    Query location y
evaluated at sensors                   (where to evaluate output)
[a(x₁), a(x₂), ..., a(xₘ)]          (y₁, y₂, ..., yₐ)
         │                                     │
         ▼                                     ▼
   ┌───────────┐                        ┌───────────┐
   │  Branch    │                        │  Trunk    │
   │  Network   │                        │  Network  │
   └─────┬─────┘                        └─────┬─────┘
         │                                     │
    [b₁, b₂, ..., bₚ]                  [τ₁, τ₂, ..., τₚ]
         │                                     │
         └──────────────┬──────────────────────┘
                        │  dot product
                        ▼
                  G_θ(a)(y) = Σₖ bₖ · τₖ + bias

\[ \mathcal{G}_\theta(a)(y) = \sum_{k=1}^{p} b_k(a) \cdot \tau_k(y) + b_0 \]

Branch network — takes the input function \(a\) sampled at \(m\) fixed sensor locations \(\{x_1, \ldots, x_m\}\) and outputs \(p\) coefficients \(\{b_1, \ldots, b_p\}\). Think of these as the “expansion coefficients” of the output in a learned basis.
Trunk network — takes the query location \(y\) (where we want to evaluate the output) and produces \(p\) basis functions \(\{\tau_1, \ldots, \tau_p\}\).

The final output is the inner product of the branch and trunk outputs. This is elegant: the branch encodes what the input function looks like, while the trunk learns where to produce the output. In other words, the branch depends on the input function \(a\), while the trunk depends on the query location \(y\).

18.4 Variants

Unstacked DeepONet — single branch, single trunk (described above). The most common variant.
Stacked DeepONet — multiple independent branch-trunk pairs summed together.
POD-DeepONet — the trunk basis functions are replaced by Proper Orthogonal Decomposition (POD) modes computed from the training data, making the trunk data-driven rather than learned.

18.5 Code example: 1D Poisson operator

We learn the solution operator of the 1D Poisson equation,

\[ -\frac{d^2 u}{dx^2}(x) \;=\; f(x), \qquad x \in [0, 1], \qquad u(0) = u(1) = 0, \]

so the operator is \(\mathcal{G}: f \mapsto u\). This is the same problem set up as the 1D_DeepONet example in Multiverse.jl. Poisson is convenient because the forward solver is just a tridiagonal linear solve, so we can generate thousands of \((f, u)\) pairs cheaply.

using Lux, Random, Optimisers, Zygote, Statistics, LinearAlgebra, Printf, CairoMakie

rng = Xoshiro(42)

#-----discretization shared by all samples------
n_sensors = 64
x_sensors = Float32.(range(0, 1, length = n_sensors))
dx        = x_sensors[2] - x_sensors[1]

#-----finite-difference Poisson solver (Dirichlet BCs)------
# Builds the tridiagonal -u'' = f with u(0)=u(1)=0.
function solve_poisson(f_vals)
    n = length(f_vals)
    # interior unknowns: indices 2:n-1
    m = n - 2
    A = SymTridiagonal(fill(2.0f0, m), fill(-1.0f0, m - 1))
    b = f_vals[2:end-1] .* dx^2
    u = zeros(Float32, n)
    u[2:end-1] = A \ b
    return u
end

#-----random source functions f(x) drawn from a GRF-like prior------
function random_source(rng)
    n_modes = rand(rng, 2:6)
    coeffs  = randn(rng, Float32, n_modes)
    freqs   = Float32.(rand(rng, 1:6, n_modes))
    phases  = 2π .* rand(rng, Float32, n_modes)
    x -> sum(coeffs[k] * sin(2π * freqs[k] * x + phases[k]) for k in 1:n_modes)
end

#-----generate paired data (f at sensors, u at query points)------
n_query = n_sensors          # evaluate u on the same grid for training
y_query = x_sensors

function generate_data(rng, n_samples)
    A = zeros(Float32, n_sensors, n_samples)        # branch inputs
    Y = zeros(Float32, 1, n_query * n_samples)      # query locations
    G = zeros(Float32, 1, n_query * n_samples)      # operator outputs

    for i in 1:n_samples
        f    = random_source(rng)
        fvec = f.(x_sensors)
        u    = solve_poisson(fvec)
        A[:, i] = fvec
        for (j, yj) in enumerate(y_query)
            idx = (i - 1) * n_query + j
            Y[1, idx] = yj
            G[1, idx] = u[j]
        end
    end
    return A, Y, G
end

A_train, Y_train, G_train = generate_data(rng,              2000)
A_test,  Y_test,  G_test  = generate_data(Xoshiro(123),      200)

#-----branch + trunk------
p = 64    # latent basis dimension

branch_net = Chain(
    Dense(n_sensors => 128, tanh),
    Dense(128 => 128, tanh),
    Dense(128 => p),
)

trunk_net = Chain(
    Dense(1   => 128, tanh),
    Dense(128 => 128, tanh),
    Dense(128 => p),
)

ps_branch, st_branch = Lux.setup(rng, branch_net)
ps_trunk,  st_trunk  = Lux.setup(rng, trunk_net)
bias_init = zeros(Float32, 1)

ps_all = (branch = ps_branch, trunk = ps_trunk, bias = bias_init)
st_all = (branch = st_branch, trunk = st_trunk)

#-----DeepONet forward: tile branch over its query points------
function deeponet_forward(A, Y, ps, st, n_samples, n_q)
    b_out, st_b = branch_net(A, ps.branch, st.branch)    # (p, n_samples)
    t_out, st_t = trunk_net(Y, ps.trunk,  st.trunk)      # (p, n_samples * n_q)

    b_tiled = similar(t_out)
    for i in 1:n_samples
        b_tiled[:, (i-1)*n_q+1 : i*n_q] .= b_out[:, i]
    end

    out = sum(b_tiled .* t_out, dims = 1) .+ ps.bias     # (1, n_samples * n_q)
    return out, (branch = st_b, trunk = st_t)
end

function deeponet_loss(ps, st)
    pred, st_new = deeponet_forward(A_train, Y_train, ps, st, size(A_train, 2), n_query)
    return mean((pred .- G_train) .^ 2), st_new
end

#-----training------
opt       = Adam(1f-3)
opt_state = Optimisers.setup(opt, ps_all)

for epoch in 1:1000
    (loss_val, st_new), grads = Zygote.withgradient(ps_all) do ps
        deeponet_loss(ps, st_all)
    end
    opt_state, ps_all = Optimisers.update(opt_state, ps_all, grads[1])
    st_all = st_new

    if epoch == 1 || epoch % 200 == 0
        pred_test, _ = deeponet_forward(A_test, Y_test, ps_all, st_all,
                                        size(A_test, 2), n_query)
        test_mse = mean((pred_test .- G_test) .^ 2)
        @printf "epoch %4d  train MSE = %.5f  test MSE = %.5f\n" epoch loss_val test_mse
    end
end

#-----visualize predictions for held-out source functions------
fig = Figure(size = (820, 520))
for i in 1:4
    f_i  = A_test[:, i]
    idxs = (i - 1) * n_query + 1 : i * n_query
    true_u = vec(G_test[:, idxs])
    pred_u, _ = deeponet_forward(reshape(f_i, :, 1),
                                 reshape(Y_test[:, idxs], 1, :),
                                 ps_all, st_all, 1, n_query)
    pred_u = vec(pred_u)

    row, col = fldmod1(i, 2)
    ax = Axis(fig[row, col], xlabel = "x", ylabel = "u(x)",
              title = "Test sample $i")
    lines!(ax, x_sensors, f_i,    color = (:gray, 0.5),  label = "source f(x)")
    lines!(ax, y_query,   true_u, color = :black,        linewidth = 2,
           label = "FD u(x)")
    lines!(ax, y_query,   pred_u, color = :steelblue,    linewidth = 2,
           linestyle = :dash, label = "DeepONet u(x)")
    i == 1 && axislegend(ax, position = :rt, labelsize = 9)
end
Label(fig[0, :], "DeepONet on 1D Poisson: f(x) → u(x)", fontsize = 14)
fig

After training, evaluating \(\mathcal{G}_\theta(f)(y)\) on a new source \(f\) is one forward pass through branch and trunk — essentially free compared with the tridiagonal solve. For a 1D Poisson this is not a big win, but the same architecture transfers directly to harder problems where each forward solve costs minutes to hours.

18.6 Key properties of DeepONet

18.6.1 Strengths

Mathematical foundation — grounded in the universal approximation theorem for operators, providing theoretical guarantees on expressiveness.
Flexible geometry — unlike FNO (which requires regular grids for the FFT), DeepONet works with arbitrary input sensor locations and query points. This is crucial for geoscience data, which is rarely on regular grids.
Modular design — the branch and trunk are independent networks that can be customized separately. For example, the branch could be a CNN if the input is an image, or an RNN if the input is a time series.
Point-wise evaluation — the trunk network produces output at individual query points, making it efficient for problems where you only need the solution at specific locations.

18.6.2 Limitations

Fixed sensor locations — the branch network requires the input function to be evaluated at the same fixed sensor locations for all samples. This can be restrictive if data comes from different measurement configurations.
Data hungry — DeepONet typically requires more training data than FNO for the same accuracy on problems with regular grids, because it doesn’t exploit spatial structure the way spectral methods do.
Scaling — for high-dimensional output functions, the number of trunk basis functions \(p\) may need to be large, increasing the computational cost.

18.7 DeepONet vs FNO

Feature	DeepONet	FNO
Architecture	Branch + Trunk	Spectral convolution layers
Grid requirements	Arbitrary (irregular OK)	Regular grid (for FFT)
Theoretical basis	Universal approximation theorem	Kernel integral operators
Resolution invariance	Through trunk network	Through Fourier representation
Best for	Irregular data, point queries	Regular-grid data, global patterns
Data efficiency	Moderate	Higher (exploits spatial structure)

18.8 Geoscience milestones

Pure (non-physics-informed) DeepONet uptake in the geosciences is still early — most published geoscience operator-learning work uses either the physics-informed variant in the next chapter or the Fourier neural operator. The original DeepONet paper Lu, Jin, et al. (2021) and the neural-operator review Kovachki et al. (2023) remain the canonical references; see also the broader survey Azizzadenesheli et al. (2024) on neural operators for the sciences.

Next: Physics-Informed DeepONet

A powerful extension of DeepONet incorporates PDE constraints into the training loss, dramatically reducing the amount of training data needed. We explore this in the next chapter on the same 1D Poisson problem.