using Lux, Random, Optimisers, Zygote, Statistics, LinearAlgebra,
Printf, CairoMakie
rng = Xoshiro(42)
#-----shared discretization on the branch side------
n_sensors = 64
x_sensors = Float32.(range(0, 1, length = n_sensors))
dx = x_sensors[2] - x_sensors[1]19 Physics-Informed DeepONet
- Physics-Informed DeepONets — the foundational paper combining DeepONet with PDE residual constraints, enabling operator learning with little or no labeled data (S. Wang et al., 2021).
- DeepONet — the underlying architecture based on the universal approximation theorem for operators (Lu et al., 2021).
- PI-DeepONet for fracture mechanics — application of physics-informed operator learning to complex multi-physics problems (Goswami et al., 2023).
- Reliable extrapolation — improving neural operator generalization through physics constraints and sparse observations (M. Wang et al., 2023).
- Physics-informed ML review — comprehensive treatment of PINNs, neural operators, and hybrid approaches (Karniadakis et al., 2021).
The DeepONet chapter showed how to learn a solution operator from input–output function pairs. But generating those pairs requires running a traditional PDE solver many times — which can be expensive. Physics-Informed DeepONet (PI-DeepONet) (S. Wang et al., 2021) eliminates or reduces this requirement by embedding the PDE directly into the DeepONet training loss, just as PINNs embed the PDE into a single-instance solver.
19.1 The key idea
Recall the DeepONet prediction:
\[ \mathcal{G}_\theta(a)(y) = \sum_{k=1}^{p} b_k(a) \cdot \tau_k(y) + b_0 \]
In a standard DeepONet, we train by minimizing the data loss:
\[ \mathcal{L}_{\mathrm{data}}(\theta) = \frac{1}{N}\sum_{i=1}^{N}\sum_{j=1}^{Q} \bigl|\mathcal{G}_\theta(a_i)(y_j) - u_i(y_j)\bigr|^2 \]
where the \(u_i\) are computed by a PDE solver. In PI-DeepONet, we replace (or supplement) this with a PDE residual loss: since the DeepONet output is differentiable with respect to the query location \(y\), we can compute \(\frac{\partial \mathcal{G}_\theta}{\partial y}\), \(\frac{\partial^2 \mathcal{G}_\theta}{\partial y^2}\), etc. via automatic differentiation and penalize violations of the PDE.
19.2 PI-DeepONet loss function
\[ \mathcal{L}(\theta) = \lambda_r \mathcal{L}_r(\theta) + \lambda_{\mathrm{bc}} \mathcal{L}_{\mathrm{bc}}(\theta) \]
with
\[ \mathcal{L}_r(\theta) = \frac{1}{N \cdot N_c}\sum_{i=1}^{N}\sum_{j=1}^{N_c}\bigl|\mathcal{N}[\mathcal{G}_\theta(a_i)](y_j^c) - a_i(y_j^c)\bigr|^2 \]
\[ \mathcal{L}_{\mathrm{bc}}(\theta) = \frac{1}{N \cdot N_{\mathrm{bc}}}\sum_{i=1}^{N}\sum_{k=1}^{N_{\mathrm{bc}}}\bigl|\mathcal{G}_\theta(a_i)(y_k^{\mathrm{bc}}) - g_i(y_k^{\mathrm{bc}})\bigr|^2 \]
The crucial difference from a PINN: the loss is evaluated over many different input functions \(a_i\) simultaneously, so the network learns an operator, not just one solution. And the crucial difference from a standard DeepONet: no PDE solver is needed to generate training data — the physics is enforced directly.
19.3 When to use PI-DeepONet
Have training data?
┌───────┴───────┐
Yes No
│ │
Standard PI-DeepONet
DeepONet (physics only)
│
Want better
generalization?
┌────┴────┐
No Yes
│ │
Standard PI-DeepONet
DeepONet (physics + data)
PI-DeepONet is most valuable when:
- No training data — you know the PDE but can’t afford to run a solver thousands of times.
- Limited data — you have a few solver runs and want to supplement with physics.
- Extrapolation — physics constraints improve generalization beyond the training distribution.
19.4 Code example: 1D Poisson operator without a solver
We learn the same operator as the previous chapter — the Poisson solution map
\[ -\frac{d^2 u}{dx^2}(x) = f(x), \quad x \in [0,1], \quad u(0) = u(1) = 0, \]
but this time we never call the finite-difference solver. The network is supervised purely by the PDE residual and the Dirichlet boundary conditions, evaluated on randomly drawn sources \(f(x)\) (see the 1D_DeepONet reference in Multiverse.jl for the data-driven counterpart).
#-----random source functions (sum of sinusoids)------
function random_source(rng)
n_modes = rand(rng, 2:5)
coeffs = 0.5f0 .* randn(rng, Float32, n_modes)
freqs = Float32.(rand(rng, 1:5, n_modes))
phases = 2π .* rand(rng, Float32, n_modes)
function f(x)
val = 0.0f0
for k in 1:n_modes
val += coeffs[k] * sin(2π * freqs[k] * x + phases[k])
end
return val
end
return f
end
function sample_batch(rng, n)
A = zeros(Float32, n_sensors, n)
funcs = Vector{Any}(undef, n)
for i in 1:n
f = random_source(rng)
A[:, i] = f.(x_sensors)
funcs[i] = f
end
return A, funcs
end#-----branch + trunk (trunk takes scalar x in [0,1])------
p = 64
branch_net = Chain(
Dense(n_sensors => 128, tanh),
Dense(128 => 128, tanh),
Dense(128 => p),
)
trunk_net = Chain(
Dense(1 => 128, tanh),
Dense(128 => 128, tanh),
Dense(128 => p),
)
ps_branch, st_branch = Lux.setup(rng, branch_net)
ps_trunk, st_trunk = Lux.setup(rng, trunk_net)
bias_init = zeros(Float32, 1)
ps = (branch = ps_branch, trunk = ps_trunk, bias = bias_init)
st = (branch = st_branch, trunk = st_trunk)#-----scalar DeepONet output for one (a, x) pair — used for AD wrt x------
function deeponet_scalar(a_sensors, x_scalar, ps, st)
b_out, _ = branch_net(reshape(a_sensors, :, 1), ps.branch, st.branch) # (p, 1)
t_out, _ = trunk_net(reshape([x_scalar], 1, 1), ps.trunk, st.trunk) # (p, 1)
return sum(b_out .* t_out) + ps.bias[1]
end#-----PI-DeepONet loss: PDE residual + Dirichlet BCs (no solver in the loop)------
n_funcs = 64
n_colloc = 32
function pi_loss(ps, st)
A_batch, funcs = sample_batch(rng, n_funcs)
loss_pde = 0.0f0
loss_bc = 0.0f0
n_pde = 0
n_bc = 0
for i in 1:n_funcs
a_i = A_batch[:, i]
f_i = funcs[i]
#----- PDE residual: -u''(x) = f(x) on interior collocation points -----
x_c = 0.02f0 .+ 0.96f0 .* rand(rng, Float32, n_colloc)
for j in 1:n_colloc
xj = x_c[j]
d2u_dx2 = Zygote.gradient(
x -> Zygote.gradient(z -> deeponet_scalar(a_i, z, ps, st), x)[1],
xj,
)[1]
residual = -d2u_dx2 - f_i(xj)
loss_pde += residual^2
n_pde += 1
end
#----- Dirichlet boundary conditions u(0) = u(1) = 0 -----
u0 = deeponet_scalar(a_i, 0.0f0, ps, st)
u1 = deeponet_scalar(a_i, 1.0f0, ps, st)
loss_bc += u0^2 + u1^2
n_bc += 2
end
total = loss_pde / n_pde + 50.0f0 * loss_bc / n_bc
return total, st
end#-----training (per-point AD makes each step expensive — small budget)------
opt = Adam(1f-3)
opt_state = Optimisers.setup(opt, ps)
for epoch in 1:300
(loss_val, st_new), grads = Zygote.withgradient(ps) do p
pi_loss(p, st)
end
opt_state, ps = Optimisers.update(opt_state, ps, grads[1])
st = st_new
if epoch == 1 || epoch % 50 == 0
@printf "epoch %3d loss = %.5f\n" epoch loss_val
end
end#-----reference solution via finite differences for comparison------
function poisson_fd(f_vals)
n = length(f_vals); m = n - 2
A = SymTridiagonal(fill(2.0f0, m), fill(-1.0f0, m - 1))
u = zeros(Float32, n)
u[2:end-1] = A \ (f_vals[2:end-1] .* dx^2)
return u
end
fig = Figure(size = (820, 520))
for i in 1:4
rng_test = Xoshiro(2000 + i)
f_test = random_source(rng_test)
a_test = f_test.(x_sensors)
u_ref = poisson_fd(a_test)
x_eval = Float32.(range(0, 1, length = 200))
u_pred = [deeponet_scalar(a_test, xj, ps, st) for xj in x_eval]
row, col = fldmod1(i, 2)
ax = Axis(fig[row, col], xlabel = "x", ylabel = "u(x)",
title = "Test source $i")
lines!(ax, x_sensors, a_test, color = (:gray, 0.5), label = "f(x)")
lines!(ax, x_sensors, u_ref, color = :black, linewidth = 2, label = "FD u(x)")
lines!(ax, x_eval, u_pred, color = :steelblue, linewidth = 2,
linestyle = :dash, label = "PI-DeepONet")
i == 1 && axislegend(ax, position = :rt, labelsize = 9)
end
Label(fig[0, :],
"PI-DeepONet on 1D Poisson — trained without solver data",
fontsize = 14)
fig19.5 How PI-DeepONet differs from PINN + DeepONet
| PINN | DeepONet | PI-DeepONet | |
|---|---|---|---|
| Learns | One PDE solution | Solution operator | Solution operator |
| Training data | None (physics only) | Input–output pairs from solver | None or few (physics + optional data) |
| Generalizes to new inputs? | No (retrain) | Yes | Yes |
| PDE knowledge used? | Yes (residual loss) | No | Yes (residual loss) |
| Solver needed? | No | Yes | No (or optional) |
19.6 The training trade-off
PI-DeepONet is more expensive per training step than standard DeepONet because it requires computing PDE derivatives via AD for each collocation point. However, it saves the offline cost of generating training data with a PDE solver. The break-even depends on:
- Solver cost — if the PDE is cheap to solve (e.g. 1D Poisson), generate data and use standard DeepONet. If expensive (e.g. 3D wave equation), PI-DeepONet saves significant compute.
- Number of input functions — PI-DeepONet scales well because it can sample random input functions during training without ever solving the PDE.
19.7 Geoscience milestones
- Seismic operator learning — Lin et al. (2021) is the representative reference for PI-DeepONet applied to seismic waveform operators, with the wave equation enforced during training.
- Groundwater flow — He et al. (2020) is the canonical reference for physics-informed operator learning constrained by Darcy’s law and the groundwater equation.
- Fracture mechanics — Goswami et al. (2023) demonstrates PI-DeepONet on stress-intensity-factor prediction under varying loading conditions, a flagship multi-physics use case.
When training PI-DeepONet, start with a strong weight on the boundary-condition loss (\(\lambda_{bc} \approx 10\)–\(100\)) and a moderate weight on the PDE residual (\(\lambda_r = 1\)). The network must first learn to satisfy the boundary conditions before the residual becomes meaningful — otherwise the optimizer drifts toward trivial constants.