Geoscientific Machine Learning

Pankaj K Mishra

doi:10.5281/zenodo.19540496

20 Fourier Neural Operator

Key references

Fourier Neural Operator (FNO) — learning solution operators in Fourier space, enabling resolution-invariant surrogate models for PDEs (Li et al., 2021).
Neural operator survey — a comprehensive mathematical treatment of neural operators and their approximation properties (Kovachki et al., 2023).
FourCastNet — global weather prediction using adaptive Fourier neural operators (Pathak et al., 2022).
U-FNO — enhanced FNO with U-Net-style skip connections for multiphase flow in subsurface reservoirs (Wen et al., 2022).
FNO on general geometries — extending FNO to handle irregular domains via learned deformations (Li et al., 2023).
Neural operators for science — review of neural operators accelerating scientific simulations (Azizzadenesheli et al., 2024).

The DeepONet learned operators using a branch-trunk decomposition. The Fourier Neural Operator (FNO) (Li et al., 2021) takes a completely different approach: it performs learning directly in Fourier space, where global spatial patterns are captured efficiently and the learned operator is inherently resolution-invariant.

FNO has become arguably the most successful neural operator architecture, powering applications from weather forecasting to fluid dynamics to seismic inversion.

20.1 From convolution to spectral convolution

A standard convolution applies a local kernel:

\[ (K * v)(x) = \int K(x - y)\, v(y)\, dy \]

This has a local receptive field — each output point depends only on nearby inputs. For PDEs, where information propagates globally (e.g., waves, diffusion), stacking many convolutional layers is needed to capture long-range interactions.

By the convolution theorem, convolution in physical space is multiplication in Fourier space:

\[ \mathcal{F}[K * v] = \mathcal{F}[K] \cdot \mathcal{F}[v] \]

The FNO leverages this: instead of learning a spatial kernel, it learns a spectral filter — a weight tensor that multiplies the Fourier coefficients directly. This gives each layer a global receptive field while keeping the computation efficient through the FFT.

20.2 FNO architecture

An FNO consists of three stages:

20.2.1 1. Lifting layer

A point-wise linear layer that maps the input from its native dimension to a higher-dimensional representation:

\[ v^{(0)}(x) = P\, a(x) + \mathbf{q} \]

where \(P \in \mathbb{R}^{d_v \times d_a}\) and \(a(x)\) is the input function at location \(x\).

20.2.2 2. Fourier layers (repeated \(L\) times)

Each Fourier layer applies:

\[ v^{(l+1)}(x) = \sigma\!\Bigl(\underbrace{W^{(l)}\, v^{(l)}(x)}_{\text{local linear}} + \underbrace{\mathcal{F}^{-1}\!\bigl[R^{(l)} \cdot \mathcal{F}[v^{(l)}]\bigr](x)}_{\text{spectral convolution}}\Bigr) \]

The spectral convolution keeps only the lowest \(k_{\max}\) Fourier modes and sets the rest to zero. The learned weight tensor \(R^{(l)} \in \mathbb{C}^{d_v \times d_v \times k_{\max}}\) acts as a frequency-dependent linear transformation. Here the superscript \((l)\) plays the same role as elsewhere in the book: it indexes layer depth.

Input v^(l)(x)
    │
    ├──────────────────────────────────┐
    │                                  │
    ▼                                  ▼
    FFT                        Local linear W^(l)
    │
    ▼
  Truncate to k_max modes
    │
    ▼
    Multiply by R^(l)
    │
    ▼
  Inverse FFT
    │                                  │
    └──────────┬───────────────────────┘
               │ add
               ▼
         Activation σ
               │
               ▼
         Output v^(l+1)(x)

20.2.3 3. Projection layer

A point-wise linear layer that maps back to the output dimension:

\[ u(x) = Q\, v^{(L)}(x) + r \]

20.2.4 Why this works

Global receptive field — each Fourier layer sees the entire spatial domain, not just a local neighborhood. This matches the Green’s-function character of elliptic problems like Poisson, where every output point depends on the source everywhere.
Resolution invariance — the spectral representation is independent of the grid resolution. A model trained on a \(64\)-point grid can be evaluated on a \(256\)-point grid by zero-padding the Fourier modes.
Efficiency — the FFT costs \(O(n \log n)\), making the spectral convolution as fast as spatial convolution for typical grid sizes.
Mode truncation as regularization — keeping only \(k_{\max}\) modes implicitly smooths the learned operator, preventing overfitting to high-frequency noise.

20.3 Code example: FNO for the 1D Poisson operator

We learn the same operator as the previous two chapters,

\[ -\frac{d^2 u}{dx^2}(x) = f(x), \qquad x \in [0, 1], \qquad u(0) = u(1) = 0, \]

so \(\mathcal{G}: f \mapsto u\), but with a Fourier neural operator on a fixed regular grid. This is also the test problem used in the 1D_DeepONet reference in Multiverse.jl, which makes side-by-side comparison with DeepONet and PI-DeepONet straightforward.

using Lux, Random, Optimisers, Zygote, Statistics, LinearAlgebra, FFTW, Printf, CairoMakie

rng = Xoshiro(42)

#-----grid + Poisson solver shared across all samples------
nx      = 64
x_grid  = Float32.(range(0, 1, length = nx))
dx      = x_grid[2] - x_grid[1]

function poisson_fd(f_vals)
    n = length(f_vals); m = n - 2
    A = SymTridiagonal(fill(2.0f0, m), fill(-1.0f0, m - 1))
    u = zeros(Float32, n)
    u[2:end-1] = A \ (f_vals[2:end-1] .* dx^2)
    return u
end

function generate_pair(rng)
    n_modes = rand(rng, 2:6)
    f = zeros(Float32, nx)
    for _ in 1:n_modes
        k = rand(rng, 1:6)
        a = 0.5f0 * randn(rng, Float32)
        ϕ = 2π * rand(rng, Float32)
        f .+= a .* sin.(2π .* k .* x_grid .+ ϕ)
    end
    return f, poisson_fd(f)
end

n_train, n_test = 1000, 200
F_train = zeros(Float32, nx, n_train); U_train = zeros(Float32, nx, n_train)
F_test  = zeros(Float32, nx, n_test);  U_test  = zeros(Float32, nx, n_test)
for i in 1:n_train
    f, u = generate_pair(rng); F_train[:, i] = f; U_train[:, i] = u
end
for i in 1:n_test
    f, u = generate_pair(Xoshiro(9000 + i)); F_test[:, i] = f; U_test[:, i] = u
end

#-----custom spectral convolution layer------
struct SpectralConv <: Lux.AbstractLuxLayer
    in_channels::Int
    out_channels::Int
    modes::Int
end

function Lux.initialparameters(rng::AbstractRNG, l::SpectralConv)
    scale = 1.0f0 / (l.in_channels * l.out_channels)
    return (R_real = scale .* randn(rng, Float32, l.out_channels, l.in_channels, l.modes),
            R_imag = scale .* randn(rng, Float32, l.out_channels, l.in_channels, l.modes))
end
Lux.initialstates(::AbstractRNG, ::SpectralConv) = NamedTuple()

function (l::SpectralConv)(x, ps, st)
    nx_in = size(x, 1); batch = size(x, 3)
    x_ft  = rfft(x, 1)                                    # (nx÷2+1, ch, batch)
    R     = complex.(ps.R_real, ps.R_imag)
    out_ft = zeros(ComplexF32, size(x_ft, 1), l.out_channels, batch)
    for b in 1:batch, k in 1:l.modes,
        o in 1:l.out_channels, ic in 1:l.in_channels
        out_ft[k, o, b] += R[o, ic, k] * x_ft[k, ic, b]
    end
    return irfft(out_ft, nx_in, 1), st
end

#-----build FNO: lift → 3 Fourier layers → project------
modes = 16
width = 32

lift      = Dense(1     => width)
spectral1 = SpectralConv(width, width, modes); skip1 = Dense(width => width)
spectral2 = SpectralConv(width, width, modes); skip2 = Dense(width => width)
spectral3 = SpectralConv(width, width, modes); skip3 = Dense(width => width)
project   = Dense(width => 1)

ps_lift,  st_lift  = Lux.setup(rng, lift)
ps_s1,    st_s1    = Lux.setup(rng, spectral1); ps_sk1, st_sk1 = Lux.setup(rng, skip1)
ps_s2,    st_s2    = Lux.setup(rng, spectral2); ps_sk2, st_sk2 = Lux.setup(rng, skip2)
ps_s3,    st_s3    = Lux.setup(rng, spectral3); ps_sk3, st_sk3 = Lux.setup(rng, skip3)
ps_proj,  st_proj  = Lux.setup(rng, project)

ps_all = (lift = ps_lift,
          s1 = ps_s1, sk1 = ps_sk1,
          s2 = ps_s2, sk2 = ps_sk2,
          s3 = ps_s3, sk3 = ps_sk3,
          proj = ps_proj)
st_all = (lift = st_lift,
          s1 = st_s1, sk1 = st_sk1,
          s2 = st_s2, sk2 = st_sk2,
          s3 = st_s3, sk3 = st_sk3,
          proj = st_proj)

function fno_forward(x, ps, st)
    nx_in, _, batch = size(x)
    v, _ = lift(reshape(x, 1, nx_in * batch), ps.lift, st.lift)
    v = reshape(v, nx_in, width, batch)

    for (sl, sk, ps_s, ps_sk, st_s, st_sk) in (
        (spectral1, skip1, ps.s1, ps.sk1, st.s1, st.sk1),
        (spectral2, skip2, ps.s2, ps.sk2, st.s2, st.sk2),
        (spectral3, skip3, ps.s3, ps.sk3, st.s3, st.sk3),
    )
        v_spec, _  = sl(v, ps_s, st_s)
        v_local, _ = sk(reshape(v, width, nx_in * batch), ps_sk, st_sk)
        v          = tanh.(v_spec .+ reshape(v_local, nx_in, width, batch))
    end

    out, _ = project(reshape(v, width, nx_in * batch), ps.proj, st.proj)
    return reshape(out, nx_in, 1, batch), st
end

#-----training------
opt_state = Optimisers.setup(Adam(1f-3), ps_all)

X_train = reshape(F_train, nx, 1, n_train); Y_train = reshape(U_train, nx, 1, n_train)
X_test  = reshape(F_test,  nx, 1, n_test);  Y_test  = reshape(U_test,  nx, 1, n_test)

for epoch in 1:500
    (loss,), grads = Zygote.withgradient(ps_all) do ps
        pred, _ = fno_forward(X_train, ps, st_all)
        mean((pred .- Y_train) .^ 2)
    end
    opt_state, ps_all = Optimisers.update(opt_state, ps_all, grads[1])

    if epoch == 1 || epoch % 100 == 0
        pred_test, _ = fno_forward(X_test, ps_all, st_all)
        test_mse = mean((pred_test .- Y_test) .^ 2)
        @printf "epoch %3d  train MSE = %.5f  test MSE = %.5f\n" epoch loss test_mse
    end
end

#-----qualitative check on held-out sources------
fig = Figure(size = (820, 520))
for i in 1:4
    f_test, u_ref = generate_pair(Xoshiro(5000 + i))
    pred, _ = fno_forward(reshape(f_test, nx, 1, 1), ps_all, st_all)

    row, col = fldmod1(i, 2)
    ax = Axis(fig[row, col], xlabel = "x", ylabel = "u(x)",
              title = "Test source $i")
    lines!(ax, x_grid, f_test,    color = (:gray, 0.5),  label = "f(x)")
    lines!(ax, x_grid, u_ref,     color = :black, linewidth = 2, label = "FD u(x)")
    lines!(ax, x_grid, vec(pred), color = :steelblue, linewidth = 2,
           linestyle = :dash, label = "FNO u(x)")
    i == 1 && axislegend(ax, position = :rt, labelsize = 9)
end
Label(fig[0, :], "FNO on 1D Poisson: f(x) → u(x)", fontsize = 14)
fig

20.4 Resolution invariance

One of FNO’s most remarkable properties is resolution invariance (also called zero-shot super-resolution). Because the learned operator acts on Fourier modes rather than grid points, a model trained at one resolution can be evaluated at a different resolution:

Train on a \(64\)-point grid.
Evaluate on a \(256\)-point grid by zero-padding the spectral weights to the new resolution.

This works because the Fourier coefficients have the same physical meaning regardless of discretization — they represent the same spatial frequencies. In practice, FNO maintains good accuracy when evaluating at \(2\)–\(4\times\) the training resolution, with degradation beyond that.

20.5 FNO variants

The original FNO has inspired many extensions:

Variant	Key idea	Use case
FNO-2D/3D	Multi-dimensional spectral convolution	2D/3D PDEs
U-FNO (Wen et al., 2022)	U-Net-style skip connections between Fourier layers	Multiphase subsurface flow
Geo-FNO (Li et al., 2023)	Learned input deformation for irregular domains	Geophysics, complex geometries
AFNO (Adaptive FNO) (Pathak et al., 2022)	Token mixing via adaptive Fourier layers	Global weather forecasting
Factorized FNO	Factorize multi-dimensional spectral weights	Reduced memory for 3D problems

20.6 Comparison with other neural operators

Feature	FNO	DeepONet	PI-DeepONet
Core mechanism	Spectral convolution	Branch-trunk decomposition	Branch-trunk + PDE loss
Grid requirement	Regular grid	Arbitrary	Arbitrary
Resolution invariance	Native (via Fourier)	Via trunk re-evaluation	Via trunk re-evaluation
Training data	Input-output pairs	Input-output pairs	Physics (+ optional data)
Global receptive field	Per layer	Per forward pass	Per forward pass
Best suited for	Regular-grid PDEs	Irregular sensors	Limited/no solver data

20.7 Geoscience milestones

Global weather forecasting — Pathak et al. (2022) (FourCastNet) is the milestone result for adaptive Fourier neural operators applied to global weather prediction.
CO₂ storage and subsurface multiphase flow — Wen et al. (2022) (U-FNO) demonstrated that an FNO with U-Net-style skip connections can serve as a real-time surrogate for reservoir-scale CO₂ injection simulations.
FNO on irregular geometries — Li et al. (2023) (Geo-FNO) extended the Fourier neural operator to the irregular domains typical of Earth science through learned coordinate deformations.

When to choose FNO

FNO excels when your data lives on regular grids and you need fast, resolution-invariant operator evaluation. If your data is on irregular sensors or you lack training data, consider DeepONet or PI-DeepONet instead.