4 Neural Networks
At this point you know how to read files, work with arrays and tables, and make figures. That is enough to start asking a more interesting question: can we build a model that learns a pattern directly from data instead of us writing the rule by hand?
That is the role of a neural network. A neural network is just a layered function with many adjustable parameters. During training, those parameters are nudged until the network maps inputs to outputs in a useful way. The idea is simple. What makes neural networks powerful is that with enough data, enough nonlinearity, and a sensible training setup, they can approximate very complicated relationships.
For geoscience, this matters because many problems involve patterns that are real but hard to write down explicitly: seismic facies, weather evolution, surrogate models for PDE solvers, or relationships between sparse observations and hidden structure in the subsurface. Neural networks are not magic shortcuts around physics, but they are flexible function approximators that become extremely useful when combined with scientific knowledge.
4.1 The basic picture
At the smallest scale, a neuron takes an input vector \(\mathbf{x}\), forms a weighted sum, adds a bias, and applies a nonlinear activation:
\[ z = \sigma\!\bigl(\mathbf{w}^\top \mathbf{x} + b\bigr) \]
One neuron is not very interesting. A layer of neurons can learn several features at once, and a stack of layers can build increasingly abstract representations. In practice, training a network always comes back to the same loop:
- Make a prediction from the input.
- Compare that prediction with the target.
- Measure the mismatch with a loss function.
- Adjust the parameters to reduce the loss.
That is the whole game. The rest of this part is about understanding what each of those steps really means.
4.2 How this part is organized
The next chapter gives you a first end-to-end example before any serious theory. That is intentional. It is easier to understand the moving parts once you have seen the whole workflow in action.
After that, we slow down and unpack the pieces one by one:
- Your First Neural Network shows the full workflow: data, model, training, and prediction.
- Building Blocks of Neural Networks explains neurons, layers, losses, gradients, and optimizers.
- The remaining chapters first cover major architecture families used throughout modern machine learning: multilayer perceptrons, convolutional networks, recurrent networks, transformers, and graph neural networks. The last group of chapters then shifts to generative modeling: autoencoders, GANs, diffusion models, and flow matching.
The goal is not to turn you into a deep-learning theorist. The goal is to make the models legible enough that when they reappear later in scientific machine learning, they feel like tools you understand rather than black boxes.
4.3 Notation used in this part
To keep the mathematics readable, this part uses a few conventions consistently:
- Bold symbols such as \(\mathbf{x}\) and \(\mathbf{h}\) denote vectors.
- Superscripts in parentheses, such as \(\mathbf{h}^{(l)}\), index layers or network depth.
- Subscripts, such as \(\mathbf{h}_t\), usually index time steps, samples, or nodes.
- \(\mathcal{L}\) denotes a loss function, while subscripts such as \(\mathcal{L}_{\mathrm{data}}\) or \(\mathcal{L}_{\mathrm{bc}}\) identify particular pieces of that loss.
These are not universal rules across all textbooks, but they will keep the notation in this book steady from chapter to chapter.
4.4 What to watch for
When you first learn neural networks, it is tempting to focus on architecture names and package syntax. Those matter, but the deeper questions are simpler:
- What is the input, and what is the output?
- What exactly is being learned?
- What loss is being minimized?
- What assumptions are hidden in the data split, the architecture, and the training loop?
If you keep asking those questions, most neural-network papers become much easier to read.
4.5 Summary
Neural networks are flexible parameterized functions trained from data. In geoscience they become especially useful when they are paired with domain structure, physical constraints, and careful evaluation. The next chapter starts with a concrete example so you can see the workflow before we dissect the machinery.