16  Physics-Informed Neural Networks

Key references
  • Early neural-network PDE solvers — using neural networks to solve ODEs and PDEs by embedding the equation in the loss function (Lagaris et al., 1998).
  • Physics-Informed Neural Networks (PINNs) — the foundational framework that popularized training neural networks with PDE residuals as soft constraints (Raissi et al., 2019).
  • PINN review — comprehensive review of physics-informed machine learning covering PINNs, operator learning, and hybrid methods (Karniadakis et al., 2021).
  • DeepXDE — a widely used library for PINNs that formalized many practical training strategies (Lu et al., 2021).
  • Gradient pathologies — understanding and mitigating the training difficulties unique to PINNs (Wang et al., 2021).
  • Neural ODEs — treating the forward pass of a neural network as an ODE, connecting deep learning and differential equations (Chen et al., 2018).
  • Universal differential equations — embedding neural networks inside differential equations to learn unknown physics (Rackauckas et al., 2020).

In the previous part of this book we treated neural networks as function approximators: given input–output pairs, the network learns to map one to the other. Physics-informed neural networks (PINNs) take a fundamentally different approach. Instead of (or in addition to) learning from data, the network is trained to satisfy a known physical law — typically a partial differential equation (PDE). The physics itself becomes part of the loss function.

16.1 The idea

Suppose we want to solve the PDE:

\[ \mathcal{N}[u](\mathbf{x}, t) = 0, \qquad \mathbf{x} \in \Omega, \quad t \in [0, T] \]

with boundary conditions \(\mathcal{B}[u] = 0\) on \(\partial\Omega\) and initial conditions \(u(\mathbf{x}, 0) = u_0(\mathbf{x})\).

A PINN approximates the solution \(u(\mathbf{x}, t)\) with a neural network \(u_\theta(\mathbf{x}, t)\), where \(\theta\) denotes the trainable parameters. The network takes coordinates \((\mathbf{x}, t)\) as input and outputs the predicted solution value.

The key insight is that because the network is differentiable, we can compute \(\frac{\partial u_\theta}{\partial t}\), \(\frac{\partial^2 u_\theta}{\partial x^2}\), etc. using automatic differentiation. We then define the PDE residual:

\[ r_\theta(\mathbf{x}, t) = \mathcal{N}[u_\theta](\mathbf{x}, t) \]

and train the network to make this residual zero everywhere.

16.2 The PINN loss function

The total loss combines three named components:

\[ \mathcal{L}(\theta) = \lambda_r \mathcal{L}_r(\theta) + \lambda_b \mathcal{L}_{\mathrm{bc}}(\theta) + \lambda_0 \mathcal{L}_0(\theta) \]

with

\[ \mathcal{L}_r(\theta) = \frac{1}{N_r}\sum_{i=1}^{N_r} |r_\theta(\mathbf{x}_i^r, t_i^r)|^2 \]

\[ \mathcal{L}_{\mathrm{bc}}(\theta) = \frac{1}{N_b}\sum_{j=1}^{N_b} |\mathcal{B}[u_\theta](\mathbf{x}_j^b, t_j^b)|^2 \]

\[ \mathcal{L}_0(\theta) = \frac{1}{N_0}\sum_{k=1}^{N_0} |u_\theta(\mathbf{x}_k^0, 0) - u_0(\mathbf{x}_k^0)|^2 \]

where:

  • \(\{(\mathbf{x}_i^r, t_i^r)\}\) are collocation points sampled inside the domain where the PDE must hold.
  • \(\{(\mathbf{x}_j^b, t_j^b)\}\) are points on the boundary.
  • \(\{(\mathbf{x}_k^0)\}\) are points at \(t=0\).
  • \(\lambda_r, \lambda_b, \lambda_0\) are weighting factors.

If observational data is also available, a fourth term \(\mathcal{L}_{\mathrm{data}}\) can be added, making the PINN a hybrid physics-data model.

16.3 NeuralPDE.jl: a symbolic interface for PINNs

Writing the residual loss by hand — sampling collocation points, computing derivatives with Zygote, and balancing the weight factors — is instructive but quickly becomes tedious for non-trivial PDEs. The Julia package NeuralPDE.jl automates this. The user states the PDE, boundary conditions, and domain symbolically through ModelingToolkit.jl; NeuralPDE.jl then discretizes the residual on collocation points, wires up automatic differentiation, and returns a standard Optimization.jl problem.

The high-level workflow is:

  1. Declare independent variables, dependent variables, and differential operators.
  2. Write the PDE and the initial/boundary conditions as symbolic equations.
  3. Define the domain as a set of intervals.
  4. Build a Lux chain for \(u_\theta\) and pass it to PhysicsInformedNN.
  5. Call discretize to obtain an OptimizationProblem.
  6. Solve it with any Optimization.jl optimizer (e.g. Adam, then BFGS).

16.4 Code example: solving the 1D heat equation

We solve the heat equation \(\frac{\partial u}{\partial t} = \alpha \frac{\partial^2 u}{\partial x^2}\) on \(x \in [0, 1]\), \(t \in [0, 1]\) with initial condition \(u(x, 0) = \sin(\pi x)\) and boundary conditions \(u(0, t) = u(1, t) = 0\). The exact solution is \(u(x,t) = e^{-\alpha \pi^2 t} \sin(\pi x)\).

using NeuralPDE, Lux, ModelingToolkit, Optimization, OptimizationOptimisers,
      OptimizationOptimJL, Random, Printf, CairoMakie
import ModelingToolkit: Interval

rng = Xoshiro(42)
α = 0.01

u_exact(x, t) = exp(-α * π^2 * t) * sin(π * x)
#-----symbolic PDE description------
@parameters x t
@variables u(..)
Dt  = Differential(t)
Dxx = Differential(x)^2

eq = Dt(u(x, t)) ~ α * Dxx(u(x, t))

bcs = [u(x, 0) ~ sin(π * x),
       u(0, t) ~ 0.0,
       u(1, t) ~ 0.0]

domains = [x  Interval(0.0, 1.0),
           t  Interval(0.0, 1.0)]

@named pdesys = PDESystem(eq, bcs, domains, [x, t], [u(x, t)])
#-----PINN architecture------
chain = Chain(Dense(2, 32, tanh),
              Dense(32, 32, tanh),
              Dense(32, 32, tanh),
              Dense(32, 1))

#-----collocation strategy------
strategy = GridTraining(0.05)
discretization = PhysicsInformedNN(chain, strategy)
prob = discretize(pdesys, discretization)
#-----training: Adam (warm start) then BFGS (sharpen)------
iter = Ref(0)
callback = function (p, l)
    iter[] += 1
    (iter[] == 1 || iter[] % 200 == 0) && @printf "iter %5d  loss = %.3e\n" iter[] l
    return false
end

res  = Optimization.solve(prob, Adam(1f-2); callback = callback, maxiters = 1500)
res  = Optimization.solve(remake(prob, u0 = res.u), BFGS();
                          callback = callback, maxiters = 500)
phi  = discretization.phi
#-----evaluate the PINN solution on a grid------
nx, nt = 80, 80
xs = collect(range(0, 1, length = nx))
ts = collect(range(0, 1, length = nt))

U_pinn = [first(phi([xi, tj], res.u)) for xi in xs, tj in ts]
U_true = [u_exact(xi, tj) for xi in xs, tj in ts]

fig = Figure(size = (820, 280))
ax1 = Axis(fig[1, 1], title = "PINN solution",  xlabel = "x", ylabel = "t")
ax2 = Axis(fig[1, 2], title = "Exact solution", xlabel = "x", ylabel = "t")
ax3 = Axis(fig[1, 3], title = "Absolute error", xlabel = "x", ylabel = "t")
heatmap!(ax1, xs, ts, U_pinn, colormap = :viridis)
heatmap!(ax2, xs, ts, U_true, colormap = :viridis)
heatmap!(ax3, xs, ts, abs.(U_pinn .- U_true), colormap = :hot)
fig

A few observations worth keeping in mind when comparing this to a hand-written PINN:

  • No explicit residual code. NeuralPDE.jl reads the symbolic equation and constructs the loss for us. Switching from the heat equation to a wave equation is a one-line change in eq.
  • Strategy is configurable. GridTraining(0.05) evaluates the residual on a fixed grid. Alternatives include StochasticTraining, QuasiRandomTraining, and QuadratureTraining — each with different bias/variance trade-offs for the residual estimator.
  • Two-stage optimization. It is common to use Adam first (cheap, noisy gradients) and finish with BFGS (sharper convergence near the minimum). The remake call hands the Adam solution to BFGS as a warm start.

16.5 PINN strengths and limitations

Strengths:

  • Mesh-free — no need to build a computational mesh. The network is evaluated at arbitrary points.
  • Inverse problems — if some physical parameters are unknown, they can be made trainable alongside the network weights. The PINN then simultaneously solves the PDE and infers the unknown parameters.
  • Data assimilation — observational data can be incorporated by adding a data-fitting term to the loss, making PINNs natural for problems where sparse data and physics must both be honored.

Limitations:

  • Training difficulty — PINNs can be hard to train, especially for stiff PDEs, multi-scale problems, or high-frequency solutions. Gradient pathologies and loss balancing are active research challenges (Wang et al., 2021).
  • Computational cost — computing second-order derivatives through automatic differentiation is expensive. For problems where a traditional numerical solver is efficient, PINNs may not be competitive.
  • Generalization — a trained PINN solves one specific instance of a PDE (one set of boundary/initial conditions). To solve the same PDE with different conditions, you must retrain. This is where neural operators (next chapter) offer an advantage.

16.6 Geoscience milestones

  • Seismic travel times via the eikonal equationSmith et al. (2021) (EikoNet) and Waheed et al. (2021) (PINNeik) are the canonical references for embedding the eikonal equation in a PINN to compute travel times in heterogeneous 3D velocity models.
  • Subsurface flow with physics-constrained deep learningZhu et al. (2019) established physics-constrained deep learning as a high-dimensional surrogate-modelling and uncertainty-quantification tool for subsurface transport.
  • Physics-informed ML for geophysics — reviewAlkhalifah et al. (2022) surveys physics-informed approaches across seismic imaging, inversion, and interpretation.