using Lux, Random, Optimisers, Zygote, Statistics, LinearAlgebra, FFTW, Printf, CairoMakie
rng = Xoshiro(42)20 Fourier Neural Operator
- Fourier Neural Operator (FNO) — learning solution operators in Fourier space, enabling resolution-invariant surrogate models for PDEs (Li et al., 2021).
- Neural operator survey — a comprehensive mathematical treatment of neural operators and their approximation properties (Kovachki et al., 2023).
- FourCastNet — global weather prediction using adaptive Fourier neural operators (Pathak et al., 2022).
- U-FNO — enhanced FNO with U-Net-style skip connections for multiphase flow in subsurface reservoirs (Wen et al., 2022).
- FNO on general geometries — extending FNO to handle irregular domains via learned deformations (Li et al., 2023).
- Neural operators for science — review of neural operators accelerating scientific simulations (Azizzadenesheli et al., 2024).
The DeepONet learned operators using a branch-trunk decomposition. The Fourier Neural Operator (FNO) (Li et al., 2021) takes a completely different approach: it performs learning directly in Fourier space, where global spatial patterns are captured efficiently and the learned operator is inherently resolution-invariant.
FNO has become arguably the most successful neural operator architecture, powering applications from weather forecasting to fluid dynamics to seismic inversion.
20.1 From convolution to spectral convolution
A standard convolution applies a local kernel:
\[ (K * v)(x) = \int K(x - y)\, v(y)\, dy \]
This has a local receptive field — each output point depends only on nearby inputs. For PDEs, where information propagates globally (e.g., waves, diffusion), stacking many convolutional layers is needed to capture long-range interactions.
By the convolution theorem, convolution in physical space is multiplication in Fourier space:
\[ \mathcal{F}[K * v] = \mathcal{F}[K] \cdot \mathcal{F}[v] \]
The FNO leverages this: instead of learning a spatial kernel, it learns a spectral filter — a weight tensor that multiplies the Fourier coefficients directly. This gives each layer a global receptive field while keeping the computation efficient through the FFT.
20.2 FNO architecture
An FNO consists of three stages:
20.2.1 1. Lifting layer
A point-wise linear layer that maps the input from its native dimension to a higher-dimensional representation:
\[ v^{(0)}(x) = P\, a(x) + \mathbf{q} \]
where \(P \in \mathbb{R}^{d_v \times d_a}\) and \(a(x)\) is the input function at location \(x\).
20.2.2 2. Fourier layers (repeated \(L\) times)
Each Fourier layer applies:
\[ v^{(l+1)}(x) = \sigma\!\Bigl(\underbrace{W^{(l)}\, v^{(l)}(x)}_{\text{local linear}} + \underbrace{\mathcal{F}^{-1}\!\bigl[R^{(l)} \cdot \mathcal{F}[v^{(l)}]\bigr](x)}_{\text{spectral convolution}}\Bigr) \]
The spectral convolution keeps only the lowest \(k_{\max}\) Fourier modes and sets the rest to zero. The learned weight tensor \(R^{(l)} \in \mathbb{C}^{d_v \times d_v \times k_{\max}}\) acts as a frequency-dependent linear transformation. Here the superscript \((l)\) plays the same role as elsewhere in the book: it indexes layer depth.
Input v^(l)(x)
│
├──────────────────────────────────┐
│ │
▼ ▼
FFT Local linear W^(l)
│
▼
Truncate to k_max modes
│
▼
Multiply by R^(l)
│
▼
Inverse FFT
│ │
└──────────┬───────────────────────┘
│ add
▼
Activation σ
│
▼
Output v^(l+1)(x)
20.2.3 3. Projection layer
A point-wise linear layer that maps back to the output dimension:
\[ u(x) = Q\, v^{(L)}(x) + r \]
20.2.4 Why this works
- Global receptive field — each Fourier layer sees the entire spatial domain, not just a local neighborhood. This matches the Green’s-function character of elliptic problems like Poisson, where every output point depends on the source everywhere.
- Resolution invariance — the spectral representation is independent of the grid resolution. A model trained on a \(64\)-point grid can be evaluated on a \(256\)-point grid by zero-padding the Fourier modes.
- Efficiency — the FFT costs \(O(n \log n)\), making the spectral convolution as fast as spatial convolution for typical grid sizes.
- Mode truncation as regularization — keeping only \(k_{\max}\) modes implicitly smooths the learned operator, preventing overfitting to high-frequency noise.
20.3 Code example: FNO for the 1D Poisson operator
We learn the same operator as the previous two chapters,
\[ -\frac{d^2 u}{dx^2}(x) = f(x), \qquad x \in [0, 1], \qquad u(0) = u(1) = 0, \]
so \(\mathcal{G}: f \mapsto u\), but with a Fourier neural operator on a fixed regular grid. This is also the test problem used in the 1D_DeepONet reference in Multiverse.jl, which makes side-by-side comparison with DeepONet and PI-DeepONet straightforward.
#-----grid + Poisson solver shared across all samples------
nx = 64
x_grid = Float32.(range(0, 1, length = nx))
dx = x_grid[2] - x_grid[1]
function poisson_fd(f_vals)
n = length(f_vals); m = n - 2
A = SymTridiagonal(fill(2.0f0, m), fill(-1.0f0, m - 1))
u = zeros(Float32, n)
u[2:end-1] = A \ (f_vals[2:end-1] .* dx^2)
return u
end
function generate_pair(rng)
n_modes = rand(rng, 2:6)
f = zeros(Float32, nx)
for _ in 1:n_modes
k = rand(rng, 1:6)
a = 0.5f0 * randn(rng, Float32)
ϕ = 2π * rand(rng, Float32)
f .+= a .* sin.(2π .* k .* x_grid .+ ϕ)
end
return f, poisson_fd(f)
end
n_train, n_test = 1000, 200
F_train = zeros(Float32, nx, n_train); U_train = zeros(Float32, nx, n_train)
F_test = zeros(Float32, nx, n_test); U_test = zeros(Float32, nx, n_test)
for i in 1:n_train
f, u = generate_pair(rng); F_train[:, i] = f; U_train[:, i] = u
end
for i in 1:n_test
f, u = generate_pair(Xoshiro(9000 + i)); F_test[:, i] = f; U_test[:, i] = u
end#-----custom spectral convolution layer------
struct SpectralConv <: Lux.AbstractLuxLayer
in_channels::Int
out_channels::Int
modes::Int
end
function Lux.initialparameters(rng::AbstractRNG, l::SpectralConv)
scale = 1.0f0 / (l.in_channels * l.out_channels)
return (R_real = scale .* randn(rng, Float32, l.out_channels, l.in_channels, l.modes),
R_imag = scale .* randn(rng, Float32, l.out_channels, l.in_channels, l.modes))
end
Lux.initialstates(::AbstractRNG, ::SpectralConv) = NamedTuple()
function (l::SpectralConv)(x, ps, st)
nx_in = size(x, 1); batch = size(x, 3)
x_ft = rfft(x, 1) # (nx÷2+1, ch, batch)
R = complex.(ps.R_real, ps.R_imag)
out_ft = zeros(ComplexF32, size(x_ft, 1), l.out_channels, batch)
for b in 1:batch, k in 1:l.modes,
o in 1:l.out_channels, ic in 1:l.in_channels
out_ft[k, o, b] += R[o, ic, k] * x_ft[k, ic, b]
end
return irfft(out_ft, nx_in, 1), st
end#-----build FNO: lift → 3 Fourier layers → project------
modes = 16
width = 32
lift = Dense(1 => width)
spectral1 = SpectralConv(width, width, modes); skip1 = Dense(width => width)
spectral2 = SpectralConv(width, width, modes); skip2 = Dense(width => width)
spectral3 = SpectralConv(width, width, modes); skip3 = Dense(width => width)
project = Dense(width => 1)
ps_lift, st_lift = Lux.setup(rng, lift)
ps_s1, st_s1 = Lux.setup(rng, spectral1); ps_sk1, st_sk1 = Lux.setup(rng, skip1)
ps_s2, st_s2 = Lux.setup(rng, spectral2); ps_sk2, st_sk2 = Lux.setup(rng, skip2)
ps_s3, st_s3 = Lux.setup(rng, spectral3); ps_sk3, st_sk3 = Lux.setup(rng, skip3)
ps_proj, st_proj = Lux.setup(rng, project)
ps_all = (lift = ps_lift,
s1 = ps_s1, sk1 = ps_sk1,
s2 = ps_s2, sk2 = ps_sk2,
s3 = ps_s3, sk3 = ps_sk3,
proj = ps_proj)
st_all = (lift = st_lift,
s1 = st_s1, sk1 = st_sk1,
s2 = st_s2, sk2 = st_sk2,
s3 = st_s3, sk3 = st_sk3,
proj = st_proj)
function fno_forward(x, ps, st)
nx_in, _, batch = size(x)
v, _ = lift(reshape(x, 1, nx_in * batch), ps.lift, st.lift)
v = reshape(v, nx_in, width, batch)
for (sl, sk, ps_s, ps_sk, st_s, st_sk) in (
(spectral1, skip1, ps.s1, ps.sk1, st.s1, st.sk1),
(spectral2, skip2, ps.s2, ps.sk2, st.s2, st.sk2),
(spectral3, skip3, ps.s3, ps.sk3, st.s3, st.sk3),
)
v_spec, _ = sl(v, ps_s, st_s)
v_local, _ = sk(reshape(v, width, nx_in * batch), ps_sk, st_sk)
v = tanh.(v_spec .+ reshape(v_local, nx_in, width, batch))
end
out, _ = project(reshape(v, width, nx_in * batch), ps.proj, st.proj)
return reshape(out, nx_in, 1, batch), st
end#-----training------
opt_state = Optimisers.setup(Adam(1f-3), ps_all)
X_train = reshape(F_train, nx, 1, n_train); Y_train = reshape(U_train, nx, 1, n_train)
X_test = reshape(F_test, nx, 1, n_test); Y_test = reshape(U_test, nx, 1, n_test)
for epoch in 1:500
(loss,), grads = Zygote.withgradient(ps_all) do ps
pred, _ = fno_forward(X_train, ps, st_all)
mean((pred .- Y_train) .^ 2)
end
opt_state, ps_all = Optimisers.update(opt_state, ps_all, grads[1])
if epoch == 1 || epoch % 100 == 0
pred_test, _ = fno_forward(X_test, ps_all, st_all)
test_mse = mean((pred_test .- Y_test) .^ 2)
@printf "epoch %3d train MSE = %.5f test MSE = %.5f\n" epoch loss test_mse
end
end#-----qualitative check on held-out sources------
fig = Figure(size = (820, 520))
for i in 1:4
f_test, u_ref = generate_pair(Xoshiro(5000 + i))
pred, _ = fno_forward(reshape(f_test, nx, 1, 1), ps_all, st_all)
row, col = fldmod1(i, 2)
ax = Axis(fig[row, col], xlabel = "x", ylabel = "u(x)",
title = "Test source $i")
lines!(ax, x_grid, f_test, color = (:gray, 0.5), label = "f(x)")
lines!(ax, x_grid, u_ref, color = :black, linewidth = 2, label = "FD u(x)")
lines!(ax, x_grid, vec(pred), color = :steelblue, linewidth = 2,
linestyle = :dash, label = "FNO u(x)")
i == 1 && axislegend(ax, position = :rt, labelsize = 9)
end
Label(fig[0, :], "FNO on 1D Poisson: f(x) → u(x)", fontsize = 14)
fig20.4 Resolution invariance
One of FNO’s most remarkable properties is resolution invariance (also called zero-shot super-resolution). Because the learned operator acts on Fourier modes rather than grid points, a model trained at one resolution can be evaluated at a different resolution:
- Train on a \(64\)-point grid.
- Evaluate on a \(256\)-point grid by zero-padding the spectral weights to the new resolution.
This works because the Fourier coefficients have the same physical meaning regardless of discretization — they represent the same spatial frequencies. In practice, FNO maintains good accuracy when evaluating at \(2\)–\(4\times\) the training resolution, with degradation beyond that.
20.5 FNO variants
The original FNO has inspired many extensions:
| Variant | Key idea | Use case |
|---|---|---|
| FNO-2D/3D | Multi-dimensional spectral convolution | 2D/3D PDEs |
| U-FNO (Wen et al., 2022) | U-Net-style skip connections between Fourier layers | Multiphase subsurface flow |
| Geo-FNO (Li et al., 2023) | Learned input deformation for irregular domains | Geophysics, complex geometries |
| AFNO (Adaptive FNO) (Pathak et al., 2022) | Token mixing via adaptive Fourier layers | Global weather forecasting |
| Factorized FNO | Factorize multi-dimensional spectral weights | Reduced memory for 3D problems |
20.6 Comparison with other neural operators
| Feature | FNO | DeepONet | PI-DeepONet |
|---|---|---|---|
| Core mechanism | Spectral convolution | Branch-trunk decomposition | Branch-trunk + PDE loss |
| Grid requirement | Regular grid | Arbitrary | Arbitrary |
| Resolution invariance | Native (via Fourier) | Via trunk re-evaluation | Via trunk re-evaluation |
| Training data | Input-output pairs | Input-output pairs | Physics (+ optional data) |
| Global receptive field | Per layer | Per forward pass | Per forward pass |
| Best suited for | Regular-grid PDEs | Irregular sensors | Limited/no solver data |
20.7 Geoscience milestones
- Global weather forecasting — Pathak et al. (2022) (FourCastNet) is the milestone result for adaptive Fourier neural operators applied to global weather prediction.
- CO₂ storage and subsurface multiphase flow — Wen et al. (2022) (U-FNO) demonstrated that an FNO with U-Net-style skip connections can serve as a real-time surrogate for reservoir-scale CO₂ injection simulations.
- FNO on irregular geometries — Li et al. (2023) (Geo-FNO) extended the Fourier neural operator to the irregular domains typical of Earth science through learned coordinate deformations.
FNO excels when your data lives on regular grids and you need fast, resolution-invariant operator evaluation. If your data is on irregular sensors or you lack training data, consider DeepONet or PI-DeepONet instead.