using Lux, Random, Optimisers, Zygote, Statistics, Printf5 Your First Neural Network
Let’s skip the theory and train a neural network end to end. You don’t need to understand how it works yet. The goal of this chapter is to show you the feel of a complete machine-learning workflow: load data, build a model, train it, and use it to predict something. Everything you see here will be explained properly in the chapters that follow.
5.1 Sauna Satisfaction Predictor
In an alternate universe, the Geological Survey of Finland maintains sauna visitor logs. For each visit, three inputs are recorded:
- Sauna temperature (°C)
- Outside temperature (°C)
- Minutes since last coffee
After the session, the visitor assigns a Satisfaction Score between 0 and 1. We will train a small neural network to predict this score from the three inputs.
5.1.1 Step 1: Load and prepare the data
Read the data from a CSV file and normalize the inputs to zero mean and unit variance. This is standard practice: neural networks learn faster when inputs are on similar scales.
lines = readlines("sauna_log.txt")
data = reduce(hcat, [parse.(Float32, split(l, ",")) for l in lines[2:end]])
X_raw = data[1:3, :]
Y = data[4:4, :]
# Normalize inputs
mu = mean(X_raw, dims = 2)
sig = std(X_raw, dims = 2)
X = (X_raw .- mu) ./ sig
# Split into train and test
rng = Xoshiro(123)
n_train = Int(round(0.8 * size(X, 2)))
idx = randperm(rng, size(X, 2))
X_train = X[:, idx[1:n_train]]
Y_train = Y[:, idx[1:n_train]]
X_test = X[:, idx[n_train+1:end]]
Y_test = Y[:, idx[n_train+1:end]]
@printf "Train: %d Test: %d\n" size(X_train, 2) size(X_test, 2)Train: 160 Test: 40
5.1.2 Step 2: Build the model
Define a small neural network with one hidden layer of 8 neurons. Don’t worry about what Dense, tanh, or Chain mean yet—we will cover all of that shortly.
model = Chain(
Dense(3 => 8, tanh),
Dense(8 => 1)
)
# Initialize parameters and state
ps, st = Lux.setup(rng, model)
# Define loss function (mean squared error)
function mse_loss(model, ps, st, data)
x, y_true = data
y_pred, st_new = model(x, ps, st)
loss = mean((y_pred .- y_true) .^ 2)
return loss, st_new, ()
endmse_loss (generic function with 1 method)
5.1.3 Step 3: Train
Feed the training data through the network 500 times (epochs), adjusting the parameters each time to reduce the prediction error.
opt = Adam(0.01f0)
function train_model(model, ps, st, data; epochs = 500, lr = 0.01f0)
tstate = Training.TrainState(model, ps, st, Adam(lr))
for epoch in 1:epochs
_, loss, _, tstate = Training.single_train_step!(
AutoZygote(), mse_loss, data, tstate
)
if epoch == 1 || epoch % 100 == 0
@printf "Epoch %4d MSE = %.6f\n" epoch loss
end
end
return tstate
end
tstate = train_model(model, ps, st, (X_train, Y_train))
# Evaluate on test data
Y_pred_test, _ = model(X_test, tstate.parameters, tstate.states)
test_mse = mean((Y_pred_test .- Y_test) .^ 2)
test_mae = mean(abs.(Y_pred_test .- Y_test))
# Baseline: always predict the training-set mean score
baseline = fill(mean(Y_train), size(Y_test))
baseline_mae = mean(abs.(baseline .- Y_test))
@printf "Test MSE: %.6f Test MAE: %.4f\n" test_mse test_mae
@printf "Baseline MAE (predict mean): %.4f\n" baseline_maeEpoch 1 MSE = 0.298768
Epoch 100 MSE = 0.021929
Epoch 200 MSE = 0.008098
Epoch 300 MSE = 0.006308
Epoch 400 MSE = 0.005393
Epoch 500 MSE = 0.005107
Test MSE: 0.006323 Test MAE: 0.0615
Baseline MAE (predict mean): 0.2668
If the network’s test MAE is clearly lower than the baseline MAE, it is learning meaningful structure instead of only predicting the average. In geoscience terms, that means the model is extracting relationships from the inputs (temperature context and coffee delay) rather than memorizing noise.
5.1.4 Step 4: Predict
Use the trained model to score a few specific sauna scenarios:
scenarios = [
(sauna=80, outside=-20, coffee=10, label="Perfect: 80°C, -20°C, fresh coffee"),
(sauna=65, outside=5, coffee=100, label="Poor: lukewarm, mild, stale coffee"),
(sauna=95, outside=-25, coffee=30, label="Extreme: 95°C, deep winter"),
]
function predict_score(s, model, ps, st, mu, sig)
x = Float32.([(s.sauna - mu[1]) / sig[1],
(s.outside - mu[2]) / sig[2],
(s.coffee - mu[3]) / sig[3]])
pred, _ = model(reshape(x, 3, 1), ps, st)
return pred[1]
end
for s in scenarios
sc = predict_score(s, model, tstate.parameters, tstate.states, mu, sig)
@printf "%-35s %.2f\n" s.label sc
endPerfect: 80°C, -20°C, fresh coffee 1.04
Poor: lukewarm, mild, stale coffee 0.07
Extreme: 95°C, deep winter 0.41
5.2 What just happened?
In about 30 lines of code you:
- Loaded data from a file and split it into training and test sets.
- Built a model — a small neural network.
- Trained it — the computer adjusted thousands of numbers inside the model so that the predictions got closer and closer to the real scores.
- Used it — fed in new inputs and got predictions back.
You don’t yet know why this works. The next chapters will explain every part: what a neuron is, what the loss function does, how gradients flow backward through the network, and why the optimizer matters. But the overall workflow — data → model → train → predict — stays exactly the same for every neural network in this book.
This chapter deletes sauna_log.txt in a final hidden cleanup step so the temporary example file does not linger in the project directory after rendering. If you stop midway through the workflow or run only selected code blocks, remove it manually:
rm sauna_log.txtOn PowerShell:
Remove-Item sauna_log.txt