LeNet — the first successful convolutional neural network for image recognition (LeCun et al., 1989).
AlexNet — deep CNN that won the ImageNet competition and launched the deep-learning era (Krizhevsky et al., 2012).
U-Net — encoder-decoder architecture with skip connections for dense prediction (Ronneberger et al., 2015).
ResNet — residual connections enabling very deep networks (100+ layers) (He et al., 2016).
A convolutional neural network (CNN) exploits the spatial structure in data — the fact that nearby pixels or grid cells tend to be related. Instead of connecting every input to every neuron, a CNN slides small learned filters across the data, detecting local patterns such as edges, textures, and shapes. This makes CNNs far more parameter-efficient than feedforward networks for image-like data.
8.1 The convolution operation
In a CNN, a filter (or kernel) is a small weight matrix, typically \(3 \times 3\) or \(5 \times 5\). The filter slides across the input and at each position computes a dot product between the filter weights and the local patch of input values. This produces a feature map — a new grid where each cell represents how strongly that local pattern was detected at that position.
For a 2D input \(\mathbf{X}\) and a filter \(\mathbf{K}\) of size \(k \times k\), the convolution at position \((i, j)\) is:
A convolutional layer applies many such filters in parallel, each learning to detect a different pattern.
8.2 Pooling
After convolution, pooling layers reduce the spatial size of the feature maps, keeping only the most important information. The most common type is max pooling, which takes the maximum value in each small window (e.g., \(2 \times 2\)). Pooling reduces computation, provides some translation invariance, and increases the receptive field of deeper layers.
8.3 A typical CNN architecture
A CNN usually alternates convolutional and pooling layers, progressively reducing spatial resolution while increasing the number of feature channels:
Input — e.g., a \(28 \times 28\) single-channel image.
Conv → ReLU → Pool — repeated 2–3 times.
Flatten — reshape the 2D feature maps into a 1D vector.
Dense layers — one or two fully connected layers for the final prediction.
We use a standard 2D CNN to classify tiny synthetic seismic-style images into three classes: layered horizons, a faulted horizon pattern, and a dome-like structure. This is easier to read than the previous texture example because the three patterns are visually distinct and the problem matches the usual CNN story: take an image as input, return one class label as output.
The problem formulation is simple: each input is one \(32 \times 32\) grayscale image patch, and the target is one of three structural classes. The network output is a vector of three class scores. After a softmax, those scores become class probabilities, and the largest probability gives the predicted class.
This is still a toy problem. Real seismic interpretation is not this clean. But for a first CNN example, it is useful because we can clearly see what the network is trying to separate and we can directly check whether the predictions match the visible pattern in the image.
usingLux, Random, Optimisers, Zygote, Statistics, Printf, CairoMakierng =Xoshiro(42)class_names = ["Layered", "Faulted", "Dome"]functiongaussian2d(x, y, μx, μy, σx, σy)exp.(-0.5f0.* (((x .- μx) ./ σx) .^2.+ ((y .- μy) ./ σy) .^2))end# Generate a small synthetic seismic-style image patch.functionmake_seismic_patch(rng, n =32) class_id =rand(rng, 1:3) x =Float32.(range(-1, 1, length = n)) y =Float32.(range(-1, 1, length = n)) xx =repeat(reshape(x, n, 1), 1, n) yy =repeat(reshape(y, 1, n), n, 1) image =zeros(Float32, n, n)if class_id ==1# Layered reflectors. image .=0.50f0.+0.22f0.*sin.(Float32(7.0) .*Float32(pi) .* (yy .+0.04f0.*sin.(Float32(2.0) .*Float32(pi) .* xx))) image .+=0.015f0.*randn(rng, Float32, n, n)elseif class_id ==2# Faulted reflectors with a visible offset. shifted_yy = yy .+0.22f0.* (xx .>0.08f0) image .=0.50f0.+0.22f0.*sin.(Float32(7.0) .*Float32(pi) .* shifted_yy) image .-=0.12f0.*gaussian2d(xx, yy, 0.08f0, 0.0f0, 0.03f0, 0.85f0) image .+=0.015f0.*randn(rng, Float32, n, n)else# Dome-like reflector geometry. dome = yy .+0.55f0.*exp.(-((xx ./0.42f0) .^2)) image .=0.50f0.+0.22f0.*sin.(Float32(7.0) .*Float32(pi) .* dome) image .+=0.015f0.*randn(rng, Float32, n, n)end image =Float32.(clamp.(image, 0.03f0, 0.98f0)) patch =reshape(image, n, n, 1) y =zeros(Float32, 3) y[class_id] =1.0f0return patch, y, class_idend# Create a labelled image datasetn_samples =900patches =zeros(Float32, 32, 32, 1, n_samples) # (height, width, channels, batch)labels =zeros(Float32, 3, n_samples)for i in1:n_samples x, y, _ =make_seismic_patch(rng) patches[:, :, :, i] .= x labels[:, i] .= yend# Train/test splitidx =randperm(rng, n_samples)n_train =Int(round(0.8* n_samples))tr = idx[1:n_train]te = idx[n_train+1:end]X_train, Y_train = patches[:, :, :, tr], labels[:, tr]X_test, Y_test = patches[:, :, :, te], labels[:, te]
The input tensor has shape (height, width, channels, batch), and the label for each patch is a one-hot vector with three entries. So this is just standard three-class image classification with geoscience-flavored patterns.
# Build a small 2D CNN classifiermodel =Chain(Conv((5, 5), 1=>8, relu; pad =SamePad()),MaxPool((2, 2)),Conv((3, 3), 8=>16, relu; pad =SamePad()),MaxPool((2, 2)),WrappedFunction(x ->reshape(x, :, size(x, 4))),Dense(16*8*8=>24, relu),Dense(24=>3))ps, st = Lux.setup(rng, model)functionsoftmax_cols(x) x_shift = x .-maximum(x, dims =1) ex =exp.(x_shift) ex ./sum(ex, dims =1)endfunctioncross_entropy_loss(model, ps, st, data) x, y = data logits, st_new =model(x, ps, st) ŷ =softmax_cols(logits) ε =1.0f-7 loss =-mean(sum(y .*log.(ŷ .+ ε), dims =1))return loss, st_new, ()endfunctionpredicted_classes(probabilities) [findmax(probabilities[:, i])[2] for i inaxes(probabilities, 2)]endfunctiontrue_classes(labels) [findmax(labels[:, i])[2] for i inaxes(labels, 2)]end
The output above reports two things. The cross-entropy tells us how confident the network is on the correct class, while the accuracy tells us how often it predicts the right class on unseen patches. For a teaching example like this one, we want both numbers to show that the CNN has learned the visible image pattern instead of just memorizing the training set.
Layered accuracy: 1.000
Faulted accuracy: 1.000
Dome accuracy: 1.000
Those per-class accuracies are useful because a single overall accuracy can hide one weak class. If one seismic pattern is consistently confused with another, it will show up here even when the mean score still looks good.
In the plot, each panel is one input image and the title shows the predicted class. That gives a direct visual check that the network output matches the pattern a human reader would also identify.
8.5 Key CNN architectures
Several landmark architectures expanded the capabilities of CNNs:
ResNet(He et al., 2016) — introduces skip connections that add the input of a block to its output, enabling training of very deep networks (100+ layers) without vanishing gradients.
U-Net(Ronneberger et al., 2015) — an encoder-decoder architecture with skip connections at each resolution level, originally designed for biomedical image segmentation. Widely adopted in geoscience for dense prediction tasks.
8.6 Geoscience applications
CNNs are the dominant architecture for geoscience tasks involving gridded spatial data:
Seismic fault detection — Wu et al. (2019) trained a 3D CNN (FaultSeg3D) on synthetic seismic volumes to segment faults in 3D, demonstrating that CNNs can detect complex fault geometries directly from seismic data.
Earthquake detection — Perol et al. (2018) developed ConvQuake, a CNN that detects and locates earthquakes directly from raw seismic waveforms, outperforming traditional detection methods in noisy environments.
Seismic waveform classification and first-break picking — Yuan et al. (2020) used a CNN for waveform classification and first-break picking, showing how convolutional models can detect local seismic patterns directly from traces.
Remote sensing — land-use classification, mineral mapping, and change detection from satellite and airborne imagery are natural CNN applications, as the data is inherently image-like.
Seismic image interpretation — 2D CNNs can classify local image patterns such as layered structure, faults, or dome-like geometry from small patches, as demonstrated in the code example above.
The key insight is: whenever your geoscience data lives on a regular grid, a CNN is likely a good starting point. The spatial weight sharing built into convolutions matches the physics of spatially correlated earth properties.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778. https://doi.org/10.1109/CVPR.2016.90
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25.
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541–551. https://doi.org/10.1162/neco.1989.1.4.541
Perol, T., Gharbi, M., & Denolle, M. (2018). Convolutional neural network for earthquake detection and location. Science Advances, 4(2), e1700578. https://doi.org/10.1126/sciadv.1700578
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention (MICCAI), 234–241. https://doi.org/10.1007/978-3-319-24574-4_28
Wu, X., Liang, L., Shi, Y., & Fomel, S. (2019). FaultSeg3D: Using synthetic data sets to train an end-to-end convolutional neural network for 3D seismic fault segmentation. Geophysics, 84(3), IM35–IM45. https://doi.org/10.1190/geo2018-0646.1
Yuan, S., Liu, J., Wang, S., Wang, T., & Shi, P. (2020). Seismic waveform classification and first-break picking using convolution neural networks. IEEE Geoscience and Remote Sensing Letters, 17(8), 1408–1412. https://doi.org/10.1109/LGRS.2019.2948601