19  Physics-Informed DeepONet

Key references
  • Physics-Informed DeepONets — the foundational paper combining DeepONet with PDE residual constraints, enabling operator learning with little or no labeled data (S. Wang et al., 2021).
  • DeepONet — the underlying architecture based on the universal approximation theorem for operators (Lu et al., 2021).
  • PI-DeepONet for fracture mechanics — application of physics-informed operator learning to complex multi-physics problems (Goswami et al., 2023).
  • Reliable extrapolation — improving neural operator generalization through physics constraints and sparse observations (M. Wang et al., 2023).
  • Physics-informed ML review — comprehensive treatment of PINNs, neural operators, and hybrid approaches (Karniadakis et al., 2021).

The DeepONet chapter showed how to learn a solution operator from input–output function pairs. But generating those pairs requires running a traditional PDE solver many times — which can be expensive. Physics-Informed DeepONet (PI-DeepONet) (S. Wang et al., 2021) eliminates or reduces this requirement by embedding the PDE directly into the DeepONet training loss, just as PINNs embed the PDE into a single-instance solver.

19.1 The key idea

Recall the DeepONet prediction:

\[ \mathcal{G}_\theta(a)(y) = \sum_{k=1}^{p} b_k(a) \cdot \tau_k(y) + b_0 \]

In a standard DeepONet, we train by minimizing the data loss:

\[ \mathcal{L}_{\mathrm{data}}(\theta) = \frac{1}{N}\sum_{i=1}^{N}\sum_{j=1}^{Q} \bigl|\mathcal{G}_\theta(a_i)(y_j) - u_i(y_j)\bigr|^2 \]

where the \(u_i\) are computed by a PDE solver. In PI-DeepONet, we replace (or supplement) this with a PDE residual loss: since the DeepONet output is differentiable with respect to the query location \(y\), we can compute \(\frac{\partial \mathcal{G}_\theta}{\partial y}\), \(\frac{\partial^2 \mathcal{G}_\theta}{\partial y^2}\), etc. via automatic differentiation and penalize violations of the PDE.

19.2 PI-DeepONet loss function

\[ \mathcal{L}(\theta) = \lambda_r \mathcal{L}_r(\theta) + \lambda_{\mathrm{bc}} \mathcal{L}_{\mathrm{bc}}(\theta) \]

with

\[ \mathcal{L}_r(\theta) = \frac{1}{N \cdot N_c}\sum_{i=1}^{N}\sum_{j=1}^{N_c}\bigl|\mathcal{N}[\mathcal{G}_\theta(a_i)](y_j^c) - a_i(y_j^c)\bigr|^2 \]

\[ \mathcal{L}_{\mathrm{bc}}(\theta) = \frac{1}{N \cdot N_{\mathrm{bc}}}\sum_{i=1}^{N}\sum_{k=1}^{N_{\mathrm{bc}}}\bigl|\mathcal{G}_\theta(a_i)(y_k^{\mathrm{bc}}) - g_i(y_k^{\mathrm{bc}})\bigr|^2 \]

The crucial difference from a PINN: the loss is evaluated over many different input functions \(a_i\) simultaneously, so the network learns an operator, not just one solution. And the crucial difference from a standard DeepONet: no PDE solver is needed to generate training data — the physics is enforced directly.

19.3 When to use PI-DeepONet

                    Have training data?
                    ┌───────┴───────┐
                   Yes              No
                    │                │
              Standard           PI-DeepONet
              DeepONet           (physics only)
                    │
               Want better
               generalization?
               ┌────┴────┐
              No         Yes
               │          │
          Standard    PI-DeepONet
          DeepONet    (physics + data)

PI-DeepONet is most valuable when:

  1. No training data — you know the PDE but can’t afford to run a solver thousands of times.
  2. Limited data — you have a few solver runs and want to supplement with physics.
  3. Extrapolation — physics constraints improve generalization beyond the training distribution.

19.4 Code example: 1D Poisson operator without a solver

We learn the same operator as the previous chapter — the Poisson solution map

\[ -\frac{d^2 u}{dx^2}(x) = f(x), \quad x \in [0,1], \quad u(0) = u(1) = 0, \]

but this time we never call the finite-difference solver. The network is supervised purely by the PDE residual and the Dirichlet boundary conditions, evaluated on randomly drawn sources \(f(x)\) (see the 1D_DeepONet reference in Multiverse.jl for the data-driven counterpart).

using Lux, Random, Optimisers, Zygote, Statistics, LinearAlgebra,
      Printf, CairoMakie

rng = Xoshiro(42)

#-----shared discretization on the branch side------
n_sensors = 64
x_sensors = Float32.(range(0, 1, length = n_sensors))
dx        = x_sensors[2] - x_sensors[1]
#-----random source functions (sum of sinusoids)------
function random_source(rng)
    n_modes = rand(rng, 2:5)
    coeffs  = 0.5f0 .* randn(rng, Float32, n_modes)
    freqs   = Float32.(rand(rng, 1:5, n_modes))
    phases  = 2π .* rand(rng, Float32, n_modes)
    function f(x)
        val = 0.0f0
        for k in 1:n_modes
            val += coeffs[k] * sin(2π * freqs[k] * x + phases[k])
        end
        return val
    end
    return f
end

function sample_batch(rng, n)
    A = zeros(Float32, n_sensors, n)
    funcs = Vector{Any}(undef, n)
    for i in 1:n
        f = random_source(rng)
        A[:, i]  = f.(x_sensors)
        funcs[i] = f
    end
    return A, funcs
end
#-----branch + trunk (trunk takes scalar x in [0,1])------
p = 64

branch_net = Chain(
    Dense(n_sensors => 128, tanh),
    Dense(128 => 128, tanh),
    Dense(128 => p),
)

trunk_net = Chain(
    Dense(1   => 128, tanh),
    Dense(128 => 128, tanh),
    Dense(128 => p),
)

ps_branch, st_branch = Lux.setup(rng, branch_net)
ps_trunk,  st_trunk  = Lux.setup(rng, trunk_net)
bias_init = zeros(Float32, 1)

ps = (branch = ps_branch, trunk = ps_trunk, bias = bias_init)
st = (branch = st_branch, trunk = st_trunk)
#-----scalar DeepONet output for one (a, x) pair — used for AD wrt x------
function deeponet_scalar(a_sensors, x_scalar, ps, st)
    b_out, _ = branch_net(reshape(a_sensors, :, 1), ps.branch, st.branch)  # (p, 1)
    t_out, _ = trunk_net(reshape([x_scalar], 1, 1), ps.trunk,  st.trunk)    # (p, 1)
    return sum(b_out .* t_out) + ps.bias[1]
end
#-----PI-DeepONet loss: PDE residual + Dirichlet BCs (no solver in the loop)------
n_funcs   = 64
n_colloc  = 32

function pi_loss(ps, st)
    A_batch, funcs = sample_batch(rng, n_funcs)

    loss_pde = 0.0f0
    loss_bc  = 0.0f0
    n_pde    = 0
    n_bc     = 0

    for i in 1:n_funcs
        a_i = A_batch[:, i]
        f_i = funcs[i]

        #----- PDE residual: -u''(x) = f(x) on interior collocation points -----
        x_c = 0.02f0 .+ 0.96f0 .* rand(rng, Float32, n_colloc)
        for j in 1:n_colloc
            xj = x_c[j]
            d2u_dx2 = Zygote.gradient(
                x -> Zygote.gradient(z -> deeponet_scalar(a_i, z, ps, st), x)[1],
                xj,
            )[1]
            residual = -d2u_dx2 - f_i(xj)
            loss_pde += residual^2
            n_pde   += 1
        end

        #----- Dirichlet boundary conditions u(0) = u(1) = 0 -----
        u0 = deeponet_scalar(a_i, 0.0f0, ps, st)
        u1 = deeponet_scalar(a_i, 1.0f0, ps, st)
        loss_bc += u0^2 + u1^2
        n_bc    += 2
    end

    total = loss_pde / n_pde + 50.0f0 * loss_bc / n_bc
    return total, st
end
#-----training (per-point AD makes each step expensive — small budget)------
opt       = Adam(1f-3)
opt_state = Optimisers.setup(opt, ps)

for epoch in 1:300
    (loss_val, st_new), grads = Zygote.withgradient(ps) do p
        pi_loss(p, st)
    end
    opt_state, ps = Optimisers.update(opt_state, ps, grads[1])
    st = st_new
    if epoch == 1 || epoch % 50 == 0
        @printf "epoch %3d  loss = %.5f\n" epoch loss_val
    end
end
#-----reference solution via finite differences for comparison------
function poisson_fd(f_vals)
    n = length(f_vals); m = n - 2
    A = SymTridiagonal(fill(2.0f0, m), fill(-1.0f0, m - 1))
    u = zeros(Float32, n)
    u[2:end-1] = A \ (f_vals[2:end-1] .* dx^2)
    return u
end

fig = Figure(size = (820, 520))
for i in 1:4
    rng_test = Xoshiro(2000 + i)
    f_test   = random_source(rng_test)
    a_test   = f_test.(x_sensors)
    u_ref    = poisson_fd(a_test)

    x_eval   = Float32.(range(0, 1, length = 200))
    u_pred   = [deeponet_scalar(a_test, xj, ps, st) for xj in x_eval]

    row, col = fldmod1(i, 2)
    ax = Axis(fig[row, col], xlabel = "x", ylabel = "u(x)",
              title = "Test source $i")
    lines!(ax, x_sensors, a_test, color = (:gray, 0.5),  label = "f(x)")
    lines!(ax, x_sensors, u_ref,  color = :black,       linewidth = 2, label = "FD u(x)")
    lines!(ax, x_eval,    u_pred, color = :steelblue,   linewidth = 2,
           linestyle = :dash, label = "PI-DeepONet")
    i == 1 && axislegend(ax, position = :rt, labelsize = 9)
end
Label(fig[0, :],
      "PI-DeepONet on 1D Poisson — trained without solver data",
      fontsize = 14)
fig

19.5 How PI-DeepONet differs from PINN + DeepONet

PINN DeepONet PI-DeepONet
Learns One PDE solution Solution operator Solution operator
Training data None (physics only) Input–output pairs from solver None or few (physics + optional data)
Generalizes to new inputs? No (retrain) Yes Yes
PDE knowledge used? Yes (residual loss) No Yes (residual loss)
Solver needed? No Yes No (or optional)

19.6 The training trade-off

PI-DeepONet is more expensive per training step than standard DeepONet because it requires computing PDE derivatives via AD for each collocation point. However, it saves the offline cost of generating training data with a PDE solver. The break-even depends on:

  • Solver cost — if the PDE is cheap to solve (e.g. 1D Poisson), generate data and use standard DeepONet. If expensive (e.g. 3D wave equation), PI-DeepONet saves significant compute.
  • Number of input functions — PI-DeepONet scales well because it can sample random input functions during training without ever solving the PDE.

19.7 Geoscience milestones

  • Seismic operator learningLin et al. (2021) is the representative reference for PI-DeepONet applied to seismic waveform operators, with the wave equation enforced during training.
  • Groundwater flowHe et al. (2020) is the canonical reference for physics-informed operator learning constrained by Darcy’s law and the groundwater equation.
  • Fracture mechanicsGoswami et al. (2023) demonstrates PI-DeepONet on stress-intensity-factor prediction under varying loading conditions, a flagship multi-physics use case.
Practical tip

When training PI-DeepONet, start with a strong weight on the boundary-condition loss (\(\lambda_{bc} \approx 10\)\(100\)) and a moderate weight on the PDE residual (\(\lambda_r = 1\)). The network must first learn to satisfy the boundary conditions before the residual becomes meaningful — otherwise the optimizer drifts toward trivial constants.