Geoscientific Machine Learning

Pankaj K Mishra

doi:10.5281/zenodo.19540496

12 Autoencoders

Key references

Deep autoencoders — training deep networks to learn compressed representations for dimensionality reduction (Hinton & Salakhutdinov, 2006).
Variational autoencoders (VAE) — a probabilistic framework that combines autoencoders with Bayesian inference to generate new samples (Kingma & Welling, 2014).

This chapter starts the generative-model block of the neural-networks part. The previous chapters focused mostly on architectures for prediction, classification, or representation learning on specific data structures. From here through GANs, diffusion models, and flow matching, the emphasis shifts toward models that learn a data distribution well enough to reconstruct, sample, or generate new realizations.

An autoencoder is a neural network trained to reconstruct its own input. This may sound pointless — why predict something you already have? The key is that the network is forced through a bottleneck: a narrow hidden layer with far fewer neurons than the input dimension. To reconstruct the input from this compressed representation, the network must learn which features are essential and which are noise.

12.1 Architecture

An autoencoder has two parts:

Encoder \(f_\theta\): maps the high-dimensional input \(\mathbf{x}\) to a low-dimensional latent code \(\mathbf{z}\):

\[ \mathbf{z} = f_\theta(\mathbf{x}), \quad \mathbf{z} \in \mathbb{R}^d, \quad d \ll \dim(\mathbf{x}) \]

Decoder \(g_\phi\): reconstructs the input from the latent code:

\[ \hat{\mathbf{x}} = g_\phi(\mathbf{z}) \]

The network is trained to minimize the reconstruction loss:

\[ \mathcal{L} = \|\mathbf{x} - \hat{\mathbf{x}}\|^2 \]

After training, the encoder provides a learned compression (useful for dimensionality reduction, denoising, and feature extraction) and the decoder can generate data from latent codes.

12.2 Variational Autoencoder (VAE)

A standard autoencoder maps each input to a single point in latent space. The variational autoencoder (Kingma & Welling, 2014) instead maps each input to a distribution — a mean \(\boldsymbol{\mu}\) and variance \(\boldsymbol{\sigma}^2\) — and samples from that distribution:

\[ \mathbf{z} = \boldsymbol{\mu} + \boldsymbol{\sigma} \odot \boldsymbol{\epsilon}, \quad \boldsymbol{\epsilon} \sim \mathcal{N}(0, I) \]

The VAE loss combines reconstruction accuracy with a regularization term that keeps the latent distributions close to a standard normal:

\[ \mathcal{L}_{\text{VAE}} = \underbrace{\|\mathbf{x} - \hat{\mathbf{x}}\|^2}_{\text{reconstruction}} + \underbrace{D_{\text{KL}}\!\bigl(q(\mathbf{z}|\mathbf{x})\, \|\, p(\mathbf{z})\bigr)}_{\text{KL divergence}} \]

This gives the VAE a smooth, continuous latent space that can be sampled to generate new data.

12.3 Code example: denoising autoencoder for geophysical signals

We train a simple autoencoder to denoise a 1D signal — a common preprocessing task in geophysics.

using Lux, Random, Optimisers, Zygote, Statistics, Printf, CairoMakie

rng = Xoshiro(42)

# Generate clean signals and noisy versions
function make_signal(rng, n = 64)
    t = Float32.(range(0, 2π, length = n))
    clean = sin.(t) .+ 0.5f0 .* sin.(3 .* t)
    phase = 2π * rand(rng, Float32)
    amp   = 0.5f0 + rand(rng, Float32)
    clean = amp .* sin.(t .+ phase) .+ (amp * 0.5f0) .* sin.(3 .* t .+ phase)
    return clean
end

n_samples = 200
sig_len = 64
clean_data = zeros(Float32, sig_len, n_samples)
noisy_data = zeros(Float32, sig_len, n_samples)

for i in 1:n_samples
    c = make_signal(rng, sig_len)
    clean_data[:, i] = c
    noisy_data[:, i] = c .+ 0.3f0 .* randn(rng, Float32, sig_len)
end

# Train/test split
idx = randperm(rng, n_samples)
n_train = Int(round(0.8 * n_samples))
tr = idx[1:n_train]
te = idx[n_train+1:end]

X_train, Y_train = noisy_data[:, tr], clean_data[:, tr]
X_test,  Y_test  = noisy_data[:, te], clean_data[:, te]

(Float32[0.029786032 1.286609 … -0.45210695 0.4506653; -0.29807103 1.1513951 … 0.43684977 0.42317572; … ; 0.20205751 1.067637 … -0.5844341 0.09730356; -0.5292163 0.82614696 … 0.4062874 0.15289801], Float32[0.09621503 1.3667396 … 0.08092124 -0.081914954; -0.08477017 1.5328605 … 0.2656388 0.15320125; … ; 0.27371335 1.1510824 … -0.10672952 -0.31406265; 0.09621489 1.3667397 … 0.08092139 -0.08191477])

# Autoencoder: encoder compresses to latent_dim, decoder reconstructs
latent_dim = 8

encoder = Chain(
    Dense(sig_len => 32, relu),
    Dense(32 => latent_dim, relu)
)

decoder = Chain(
    Dense(latent_dim => 32, relu),
    Dense(32 => sig_len)
)

autoencoder = Chain(encoder, decoder)

ps, st = Lux.setup(rng, autoencoder)

function ae_loss(model, ps, st, data)
    noisy, clean = data
    reconstructed, st_new = model(noisy, ps, st)
    loss = mean((reconstructed .- clean) .^ 2)
    return loss, st_new, ()
end

ae_loss (generic function with 1 method)

function train_model(model, ps, st, data; epochs = 500, lr = 0.003f0)
    tstate = Training.TrainState(model, ps, st, Adam(lr))
    for epoch in 1:epochs
        _, loss, _, tstate = Training.single_train_step!(
            AutoZygote(), ae_loss, data, tstate
        )
        if epoch == 1 || epoch % 100 == 0
            @printf "Epoch %4d  MSE = %.6f\n" epoch loss
        end
    end
    return tstate
end

tstate = train_model(autoencoder, ps, st, (X_train, Y_train))

# Holdout reconstruction quality
Y_test_pred, _ = autoencoder(X_test, tstate.parameters, tstate.states)
test_mse = mean((Y_test_pred .- Y_test) .^ 2)
@printf "Holdout reconstruction MSE = %.6f\n" test_mse

Epoch    1  MSE = 6.554671
Epoch  100  MSE = 0.291780
Epoch  200  MSE = 0.066843
Epoch  300  MSE = 0.047740
Epoch  400  MSE = 0.040926
Epoch  500  MSE = 0.037677
Holdout reconstruction MSE = 0.173981

# Test on a new signal
test_clean = make_signal(Xoshiro(99), sig_len)
test_noisy = test_clean .+ 0.3f0 .* randn(Xoshiro(99), Float32, sig_len)

denoised, _ = autoencoder(reshape(test_noisy, sig_len, 1),
                          tstate.parameters, tstate.states)

fig = Figure(size = (700, 300))
ax = Axis(fig[1, 1], xlabel = "Sample", ylabel = "Amplitude",
          title = "Denoising autoencoder")
lines!(ax, test_noisy, color = (:gray, 0.5), label = "Noisy input")
lines!(ax, test_clean, color = :black, linewidth = 2, label = "Clean signal")
lines!(ax, vec(denoised), color = :steelblue, linewidth = 2,
       linestyle = :dash, label = "Denoised (AE)")
axislegend(ax, position = :rt)
fig

┌ Warning: Mixed-Precision `matmul_cpu_fallback!` detected and Octavian.jl cannot be used for this set of inputs (C [Matrix{Float64}]: A [Matrix{Float32}] x B [Matrix{Float64}]). Falling back to generic implementation. This may be slow.
└ @ LuxLib.Impl ~/.julia/packages/LuxLib/ZJ3gh/src/impl/matmul.jl:194

12.4 When to use autoencoders

Task	Autoencoder variant
Dimensionality reduction	Standard AE
Denoising	Denoising AE (train on noisy → clean pairs)
Anomaly detection	Train on normal data; high reconstruction error = anomaly
Generative modeling	VAE (sample from latent space to create new data)
Feature learning	Use encoder output as features for downstream tasks

12.5 Geoscience milestones

Subsurface inverse modeling — Lopez-Alvis et al. (2019) used deep autoencoder-based emulation for inverse modeling of subsurface transport, compressing the parameter space before inversion.
Representation learning in Earth science — Bergen et al. (2019) and Reichstein et al. (2019) place autoencoder-based dimensionality reduction inside the wider deep-learning landscape of the geosciences.