12  Autoencoders

TipKey references
  • Deep autoencoders — training deep networks to learn compressed representations for dimensionality reduction (Hinton & Salakhutdinov, 2006).
  • Variational autoencoders (VAE) — a probabilistic framework that combines autoencoders with Bayesian inference to generate new samples (Kingma & Welling, 2014).

This chapter starts the generative-model block of the neural-networks part. The previous chapters focused mostly on architectures for prediction, classification, or representation learning on specific data structures. From here through GANs, diffusion models, and flow matching, the emphasis shifts toward models that learn a data distribution well enough to reconstruct, sample, or generate new realizations.

An autoencoder is a neural network trained to reconstruct its own input. This may sound pointless — why predict something you already have? The key is that the network is forced through a bottleneck: a narrow hidden layer with far fewer neurons than the input dimension. To reconstruct the input from this compressed representation, the network must learn which features are essential and which are noise.

12.1 Architecture

An autoencoder has two parts:

  1. Encoder \(f_\theta\): maps the high-dimensional input \(\mathbf{x}\) to a low-dimensional latent code \(\mathbf{z}\):

\[ \mathbf{z} = f_\theta(\mathbf{x}), \quad \mathbf{z} \in \mathbb{R}^d, \quad d \ll \dim(\mathbf{x}) \]

  1. Decoder \(g_\phi\): reconstructs the input from the latent code:

\[ \hat{\mathbf{x}} = g_\phi(\mathbf{z}) \]

The network is trained to minimize the reconstruction loss:

\[ \mathcal{L} = \|\mathbf{x} - \hat{\mathbf{x}}\|^2 \]

After training, the encoder provides a learned compression (useful for dimensionality reduction, denoising, and feature extraction) and the decoder can generate data from latent codes.

12.2 Variational Autoencoder (VAE)

A standard autoencoder maps each input to a single point in latent space. The variational autoencoder (Kingma & Welling, 2014) instead maps each input to a distribution — a mean \(\boldsymbol{\mu}\) and variance \(\boldsymbol{\sigma}^2\) — and samples from that distribution:

\[ \mathbf{z} = \boldsymbol{\mu} + \boldsymbol{\sigma} \odot \boldsymbol{\epsilon}, \quad \boldsymbol{\epsilon} \sim \mathcal{N}(0, I) \]

The VAE loss combines reconstruction accuracy with a regularization term that keeps the latent distributions close to a standard normal:

\[ \mathcal{L}_{\text{VAE}} = \underbrace{\|\mathbf{x} - \hat{\mathbf{x}}\|^2}_{\text{reconstruction}} + \underbrace{D_{\text{KL}}\!\bigl(q(\mathbf{z}|\mathbf{x})\, \|\, p(\mathbf{z})\bigr)}_{\text{KL divergence}} \]

This gives the VAE a smooth, continuous latent space that can be sampled to generate new data.

12.3 Code example: denoising autoencoder for geophysical signals

We train a simple autoencoder to denoise a 1D signal — a common preprocessing task in geophysics.

using Lux, Random, Optimisers, Zygote, Statistics, Printf, CairoMakie

rng = Xoshiro(42)

# Generate clean signals and noisy versions
function make_signal(rng, n = 64)
    t = Float32.(range(0, 2π, length = n))
    clean = sin.(t) .+ 0.5f0 .* sin.(3 .* t)
    phase = 2π * rand(rng, Float32)
    amp   = 0.5f0 + rand(rng, Float32)
    clean = amp .* sin.(t .+ phase) .+ (amp * 0.5f0) .* sin.(3 .* t .+ phase)
    return clean
end

n_samples = 200
sig_len = 64
clean_data = zeros(Float32, sig_len, n_samples)
noisy_data = zeros(Float32, sig_len, n_samples)

for i in 1:n_samples
    c = make_signal(rng, sig_len)
    clean_data[:, i] = c
    noisy_data[:, i] = c .+ 0.3f0 .* randn(rng, Float32, sig_len)
end

# Train/test split
idx = randperm(rng, n_samples)
n_train = Int(round(0.8 * n_samples))
tr = idx[1:n_train]
te = idx[n_train+1:end]

X_train, Y_train = noisy_data[:, tr], clean_data[:, tr]
X_test,  Y_test  = noisy_data[:, te], clean_data[:, te]
(Float32[0.029786032 1.286609 … -0.45210695 0.4506653; -0.29807103 1.1513951 … 0.43684977 0.42317572; … ; 0.20205751 1.067637 … -0.5844341 0.09730356; -0.5292163 0.82614696 … 0.4062874 0.15289801], Float32[0.09621503 1.3667396 … 0.08092124 -0.081914954; -0.08477017 1.5328605 … 0.2656388 0.15320125; … ; 0.27371335 1.1510824 … -0.10672952 -0.31406265; 0.09621489 1.3667397 … 0.08092139 -0.08191477])
# Autoencoder: encoder compresses to latent_dim, decoder reconstructs
latent_dim = 8

encoder = Chain(
    Dense(sig_len => 32, relu),
    Dense(32 => latent_dim, relu)
)

decoder = Chain(
    Dense(latent_dim => 32, relu),
    Dense(32 => sig_len)
)

autoencoder = Chain(encoder, decoder)

ps, st = Lux.setup(rng, autoencoder)

function ae_loss(model, ps, st, data)
    noisy, clean = data
    reconstructed, st_new = model(noisy, ps, st)
    loss = mean((reconstructed .- clean) .^ 2)
    return loss, st_new, ()
end
ae_loss (generic function with 1 method)
function train_model(model, ps, st, data; epochs = 500, lr = 0.003f0)
    tstate = Training.TrainState(model, ps, st, Adam(lr))
    for epoch in 1:epochs
        _, loss, _, tstate = Training.single_train_step!(
            AutoZygote(), ae_loss, data, tstate
        )
        if epoch == 1 || epoch % 100 == 0
            @printf "Epoch %4d  MSE = %.6f\n" epoch loss
        end
    end
    return tstate
end

tstate = train_model(autoencoder, ps, st, (X_train, Y_train))

# Holdout reconstruction quality
Y_test_pred, _ = autoencoder(X_test, tstate.parameters, tstate.states)
test_mse = mean((Y_test_pred .- Y_test) .^ 2)
@printf "Holdout reconstruction MSE = %.6f\n" test_mse
Epoch    1  MSE = 6.554671
Epoch  100  MSE = 0.291780
Epoch  200  MSE = 0.066843
Epoch  300  MSE = 0.047740
Epoch  400  MSE = 0.040926
Epoch  500  MSE = 0.037677
Holdout reconstruction MSE = 0.173980
# Test on a new signal
test_clean = make_signal(Xoshiro(99), sig_len)
test_noisy = test_clean .+ 0.3f0 .* randn(Xoshiro(99), Float32, sig_len)

denoised, _ = autoencoder(reshape(test_noisy, sig_len, 1),
                          tstate.parameters, tstate.states)

fig = Figure(size = (700, 300))
ax = Axis(fig[1, 1], xlabel = "Sample", ylabel = "Amplitude",
          title = "Denoising autoencoder")
lines!(ax, test_noisy, color = (:gray, 0.5), label = "Noisy input")
lines!(ax, test_clean, color = :black, linewidth = 2, label = "Clean signal")
lines!(ax, vec(denoised), color = :steelblue, linewidth = 2,
       linestyle = :dash, label = "Denoised (AE)")
axislegend(ax, position = :rt)
fig
Warning: Mixed-Precision `matmul_cpu_fallback!` detected and Octavian.jl cannot be used for this set of inputs (C [Matrix{Float64}]: A [Matrix{Float32}] x B [Matrix{Float64}]). Falling back to generic implementation. This may be slow.
@ LuxLib.Impl C:\Users\pmishra\.julia\packages\LuxLib\ZJ3gh\src\impl\matmul.jl:194

12.4 When to use autoencoders

Task Autoencoder variant
Dimensionality reduction Standard AE
Denoising Denoising AE (train on noisy → clean pairs)
Anomaly detection Train on normal data; high reconstruction error = anomaly
Generative modeling VAE (sample from latent space to create new data)
Feature learning Use encoder output as features for downstream tasks

12.5 Geoscience applications

  • Seismic denoising — autoencoders trained to map noisy seismic traces to clean versions, effectively learning the noise characteristics of the acquisition system.
  • Geological model compression — high-dimensional 3D geological property models can be compressed to a low-dimensional latent space using autoencoders, making inversion and uncertainty quantification computationally feasible.
  • Anomaly detection in monitoring data — autoencoders trained on normal operating data (e.g., from geothermal wells or mining sensors) flag anomalies when reconstruction error exceeds a threshold.
  • Subsurface modelingLopez-Alvis et al. (2019) used deep autoencoder-based approaches for inverse modeling of subsurface transport, where the autoencoder compresses the parameter space before inversion.
  • OverviewBergen et al. (2019) discusses the role of representation learning and dimensionality reduction, of which autoencoders are a central tool, across geoscience disciplines.