18  Physics-Based ML for Inversion

TipKey references
  • Implicit neural representations (INRs) — coordinate-based networks that represent continuous signals, originally popularized for 3D scene reconstruction (Mildenhall et al., 2022; Sitzmann et al., 2020).
  • Fourier feature encodings — overcoming spectral bias of MLPs by mapping inputs through Fourier features, enabling learning of high-frequency detail (Tancik et al., 2020).
  • PINNs for inversion — embedding PDE constraints in the loss to simultaneously solve forward and inverse problems (Raissi et al., 2019).
  • Physics-informed review — broad treatment of physics-informed ML including PDE-constrained optimization and hybrid approaches (Karniadakis et al., 2021).
  • Self-adaptive PINNs — learning the loss-weighting factors alongside the network parameters to balance PDE residual and data terms (McClenny & Braga-Neto, 2023).

In the previous chapter we used PINNs to solve a PDE: the network represented the solution field and was trained purely from the governing equation. Here we tackle a harder and more practically important problem: geophysical inversion. The goal is to recover an unknown physical property (e.g. subsurface conductivity, seismic velocity, or density) from measured data, while honoring a known PDE that relates the property to the observations.

The idea is simple but powerful: represent the unknown property field as a neural network (an implicit neural representation, or INR), and train it by minimizing the PDE residual together with a data-misfit term.

18.1 Why not just use a PINN?

A standard PINN parameterizes the solution \(u(\mathbf{x})\) as a network and trains it to satisfy the PDE. In an inverse problem, both the solution \(u\) and the material property \(m(\mathbf{x})\) are unknown. We observe noisy data \(d^{\text{obs}}\) at a few measurement locations, and we know the PDE:

\[ \mathcal{N}[u; m] = 0 \quad \text{in } \Omega \]

A physics-based ML inversion parameterizes the model \(m_\theta(\mathbf{x})\) as a neural network and computes the PDE solution (or its residual) as part of the training loop. This approach has two key advantages:

  1. Regularization through architecture — the network implicitly regularizes the solution because it can only represent smooth functions (spectral bias), eliminating the need for explicit Tikhonov-style regularization.
  2. Mesh-free, continuous representation — the recovered model is a continuous function that can be queried at any point, not tied to a fixed grid.

18.2 The inversion framework

┌──────────────────────────────────────────────────────────────────┐
│  Coordinates (x, z) ──▶ INR network m_θ(x, z) ──▶ property     │
│                              │                                    │
│                              ▼                                    │
│          PDE solver (or PDE residual)  →  predicted data d_pred   │
│                              │                                    │
│                              ▼                                    │
│  Loss = ‖d_pred − d_obs‖² + λ · PDE residual + regularization   │
│                              │                                    │
│                   backprop through everything                     │
│                   update θ with gradient descent                  │
└──────────────────────────────────────────────────────────────────┘

The total loss is:

\[ \mathcal{L}(\theta) = \mathcal{L}_{\mathrm{data}}(\theta) + \lambda_r \mathcal{L}_r(\theta) \]

where

\[ \mathcal{L}_{\mathrm{data}}(\theta) = \frac{1}{N_d}\sum_{i=1}^{N_d}\bigl(d_i^{\mathrm{pred}}(\theta) - d_i^{\mathrm{obs}}\bigr)^2 \]

\[ \mathcal{L}_r(\theta) = \frac{1}{N_c}\sum_{j=1}^{N_c}\bigl|\mathcal{N}[u; m_\theta](\mathbf{x}_j)\bigr|^2 \]

In this chapter I use \(d^{\mathrm{obs}}\) and \(d^{\mathrm{pred}}\) instead of \(y\) and \(\hat{y}\) because the distinction between measured and predicted data is part of the geophysical setup, not just generic regression notation.

18.3 Implicit neural representations (INRs)

An INR is simply a coordinate-based MLP:

\[ m_\theta: \mathbb{R}^d \to \mathbb{R}, \quad (x_1, \ldots, x_d) \mapsto m_\theta(x_1, \ldots, x_d) \]

The network takes spatial coordinates as input and outputs the physical property at that location. By construction, the output is a continuous, differentiable function of position.

18.3.1 Spectral bias and positional encoding

Plain MLPs suffer from spectral bias — they learn low-frequency components first and struggle with high-frequency detail (Tancik et al., 2020). For geophysical models with sharp boundaries or fine-scale structure, this is a problem.

Fourier feature encoding addresses this by mapping input coordinates through sinusoidal functions before passing them to the network:

\[ \gamma(\mathbf{x}) = \bigl[\sin(2\pi \mathbf{B}\mathbf{x}),\; \cos(2\pi \mathbf{B}\mathbf{x})\bigr] \]

where \(\mathbf{B} \in \mathbb{R}^{m \times d}\) is a random matrix (sampled once and fixed). The frequency scale of \(\mathbf{B}\) controls the spatial resolution the network can represent.

18.4 Code example: 1D steady-state diffusion inversion

We solve a toy inverse problem: recover a spatially varying diffusion coefficient \(\kappa(x)\) from noisy observations of the solution field \(u(x)\).

Forward problem (steady-state diffusion):

\[ -\frac{d}{dx}\!\left[\kappa(x)\,\frac{du}{dx}\right] = f(x), \quad x \in [0, 1] \]

with \(u(0) = 0\), \(u(1) = 0\) and a known source \(f(x) = \sin(2\pi x)\).

Inverse problem: Given noisy measurements of \(u\) at a few sensor locations, recover \(\kappa(x)\).

using Lux, Random, Optimisers, Zygote, Statistics, Printf, CairoMakie

rng = Xoshiro(42)
# True diffusion coefficient (what we want to recover)
κ_true(x) = 1.0f0 + 0.5f0 * sin(2π * x)

# Source term
f_source(x) = sin(2π * x)

# Generate "observed" data by solving the ODE numerically (finite differences)
nx_fd = 200
x_fd = Float32.(range(0, 1, length = nx_fd))
dx_fd = x_fd[2] - x_fd[1]

# Build FD system: -d/dx[κ du/dx] = f
# Using central differences
A = zeros(Float32, nx_fd, nx_fd)
rhs = zeros(Float32, nx_fd)

A[1, 1] = 1.0f0   # u(0) = 0
A[end, end] = 1.0f0  # u(1) = 0

for i in 2:nx_fd-1
    κ_plus  = 0.5f0 * (κ_true(x_fd[i]) + κ_true(x_fd[i+1]))
    κ_minus = 0.5f0 * (κ_true(x_fd[i]) + κ_true(x_fd[i-1]))
    A[i, i-1] = -κ_minus / dx_fd^2
    A[i, i]   = (κ_plus + κ_minus) / dx_fd^2
    A[i, i+1] = -κ_plus / dx_fd^2
    rhs[i] = f_source(x_fd[i])
end

u_fd = A \ rhs

# Sample noisy observations at sparse locations
n_obs = 20
obs_idx = sort(rand(rng, 2:nx_fd-1, n_obs))
x_obs = x_fd[obs_idx]
u_obs = u_fd[obs_idx] .+ 0.005f0 .* randn(rng, Float32, n_obs)
# Plot the true solution and observations
fig = Figure(size = (700, 250))
ax1 = Axis(fig[1, 1], xlabel = "x", ylabel = "u(x)", title = "Solution field")
lines!(ax1, x_fd, u_fd, color = :black, linewidth = 2, label = "True u(x)")
scatter!(ax1, x_obs, u_obs, color = :red, markersize = 8, label = "Observations")
axislegend(ax1, position = :rt)

ax2 = Axis(fig[1, 2], xlabel = "x", ylabel = "κ(x)", title = "Diffusion coefficient")
lines!(ax2, x_fd, κ_true.(x_fd), color = :black, linewidth = 2, label = "True κ(x)")
axislegend(ax2, position = :rb)
fig
#-----κ representation------
n_features = 32
B_matrix = randn(rng, Float32, n_features, 1) .* 4.0f0  # frequency scale

function fourier_encode(x, B)
    projected = B * reshape(x, 1, :)  # (n_features, n_points)
    return vcat(sin.(2π .* projected), cos.(2π .* projected))  # (2*n_features, n_points)
end

#-----κ-network------
κ_model = Chain(
    Dense(2 * n_features => 64, tanh),
    Dense(64 => 64, tanh),
    Dense(64 => 1)
)

#-----u-network------
u_model = Chain(
    Dense(2 * n_features => 64, tanh),
    Dense(64 => 64, tanh),
    Dense(64 => 1)
)

ps_κ, st_κ = Lux.setup(rng, κ_model)
ps_u, st_u = Lux.setup(rng, u_model)
#-----combined loss------
function inversion_loss(ps_κ, ps_u, st_κ, st_u)
    λᵣ = 1.0f0
    λᵦ = 10.0f0

    # Evaluate κ and u at collocation points
    n_colloc = 100
    x_c = Float32.(range(0.01, 0.99, length = n_colloc))
    x_c_enc = fourier_encode(x_c, B_matrix)
    
    # Predict κ at collocation points (enforce positivity with softplus)
    κ_pred_raw, _ = κ_model(x_c_enc, ps_κ, st_κ)
    κ_pred = Lux.NNlib.softplus.(κ_pred_raw)  # ensure κ > 0
    
    # Predict u at collocation points
    u_pred_c, _ = u_model(x_c_enc, ps_u, st_u)
    
    # PDE residual via AD: -d/dx[κ du/dx] - f = 0
    # We compute du/dx and d(κ du/dx)/dx using finite differences on the
    # collocation grid (dense enough for accuracy), keeping AD for the networks
    δx = x_c[2] - x_c[1]
    u_vec = vec(u_pred_c)
    κ_vec = vec(κ_pred)
    
    # du/dx at half-points
    du_dx = diff(u_vec) ./ δx  # (n_colloc - 1,)
    # κ at half-points
    κ_half = 0.5f0 .* (κ_vec[1:end-1] .+ κ_vec[2:end])
    # flux: κ * du/dx
    flux = κ_half .* du_dx
    # d(flux)/dx at interior points
    d_flux_dx = diff(flux) ./ δx  # (n_colloc - 2,)
    
    f_vals = f_source.(x_c[2:end-1])
    pde_residual = -d_flux_dx .- f_vals
    loss_pde = mean(pde_residual .^ 2)
    
    # Data misfit: compare u predictions at observation locations
    x_obs_enc = fourier_encode(x_obs, B_matrix)
    u_pred_obs, _ = u_model(x_obs_enc, ps_u, st_u)
    loss_data = mean((vec(u_pred_obs) .- u_obs) .^ 2)
    
    # Boundary conditions: u(0) = 0, u(1) = 0
    x_bc = Float32[0.0, 1.0]
    x_bc_enc = fourier_encode(x_bc, B_matrix)
    u_bc, _ = u_model(x_bc_enc, ps_u, st_u)
    loss_bc = mean(u_bc .^ 2)
    
    total = loss_data + λᵣ * loss_pde + λᵦ * loss_bc
    return total
end
# Train both networks jointly
opt_κ = Adam(0.001f0)
opt_u = Adam(0.001f0)
opt_state_κ = Optimisers.setup(opt_κ, ps_κ)
opt_state_u = Optimisers.setup(opt_u, ps_u)

for epoch in 1:2000
    (loss_val), grads = Zygote.withgradient(ps_κ, ps_u) do pκ, pu
        inversion_loss(pκ, pu, st_κ, st_u)
    end
    
    opt_state_κ, ps_κ = Optimisers.update(opt_state_κ, ps_κ, grads[1])
    opt_state_u, ps_u = Optimisers.update(opt_state_u, ps_u, grads[2])
    
    if epoch == 1 || epoch % 400 == 0
        @printf "Epoch %4d  Loss = %.6f\n" epoch loss_val
    end
end
# Compare recovered κ to the true one
x_plot = Float32.(range(0, 1, length = 200))
x_plot_enc = fourier_encode(x_plot, B_matrix)

κ_recovered_raw, _ = κ_model(x_plot_enc, ps_κ, st_κ)
κ_recovered = vec(Lux.NNlib.softplus.(κ_recovered_raw))

u_recovered, _ = u_model(x_plot_enc, ps_u, st_u)
u_recovered = vec(u_recovered)

fig = Figure(size = (700, 300))
ax1 = Axis(fig[1, 1], xlabel = "x", ylabel = "κ(x)",
           title = "Recovered diffusion coefficient")
lines!(ax1, x_plot, κ_true.(x_plot), color = :black, linewidth = 2,
       label = "True κ(x)")
lines!(ax1, x_plot, κ_recovered, color = :steelblue, linewidth = 2,
       linestyle = :dash, label = "INR recovered")
axislegend(ax1, position = :rb)

ax2 = Axis(fig[1, 2], xlabel = "x", ylabel = "u(x)",
           title = "Recovered solution field")
lines!(ax2, x_fd, u_fd, color = :black, linewidth = 2, label = "True u(x)")
lines!(ax2, x_plot, u_recovered, color = :steelblue, linewidth = 2,
       linestyle = :dash, label = "INR recovered")
scatter!(ax2, x_obs, u_obs, color = :red, markersize = 6, label = "Observations")
axislegend(ax2, position = :rt)
fig

18.5 Why INRs work well for geophysical inversion

  1. Implicit regularization — the network architecture itself acts as a regularizer. Smooth activation functions (like tanh) and limited network width naturally produce smooth model outputs without requiring explicit penalty terms.

  2. Spectral bias as a feature — for geophysical inverse problems (which are ill-posed), the MLP’s tendency to learn low frequencies first is actually beneficial: it recovers the large-scale structure first, then gradually adds detail, mimicking multi-scale inversion strategies.

  3. Continuous representation — unlike grid-based inversion, the recovered model is defined everywhere in the domain. This makes it easy to evaluate at arbitrary resolution, compute derivatives, or interface with mesh-free PDE solvers.

  4. Fourier features for control — by tuning the frequency scale of the Fourier feature encoding, you directly control the maximum spatial frequency the network can represent, providing explicit control over the model’s resolution.

18.6 Connection to traditional inversion

The INR approach is closely related to classical regularized inversion:

Classical inversion INR-based inversion
Unknown Model vector \(\mathbf{m} \in \mathbb{R}^M\) Network weights \(\theta\)
Regularization Explicit: \(\|\nabla m\|^2\), TV, etc. Implicit: architecture + spectral bias
Resolution Fixed by mesh Continuous, adaptive
Gradient Adjoint-state method Automatic differentiation
Constraints Hard to impose smoothly Natural via activation functions

18.7 Geoscience applications

INR-based inversion is gaining traction across geoscience:

  • Gravity and magnetic inversion — INRs parameterize 3D density or susceptibility models and train with the forward gravity/magnetic kernel, replacing voxel-based parameterizations with continuous neural fields.
  • Seismic full-waveform inversion — the velocity model is represented as an INR, and the wave equation residual provides the physics constraint (Rasht-Behesht et al., 2022).
  • Electrical resistivity tomography — the conductivity distribution is learned as an INR trained with the governing Poisson equation and electrode measurements.
  • Geothermal reservoir characterization — physics-informed INRs have been applied to recover permeability fields from temperature and pressure observations (Sun et al., 2023).
NoteLooking ahead

This chapter combined a single PDE with a single set of observations. In practice, geophysical inversions involve multiple data types (e.g., gravity + seismic) and complex 3D geometries. The INR framework extends naturally to these settings by adding more physics terms to the loss and using higher-dimensional coordinate inputs.