Uncertainty quantification#

R.A. Collenteur

In this notebook it is shown how to compute the uncertainty of the model simulation using the built-in uncertainty quantification options of Pastas.

  • Confidence interval of simulation

  • Prediction interval of simulation

  • Confidence interval of step response

  • Confidence interval of block response

  • Confidence interval of contribution

  • Custom confidence intervals

The compute the confidence intervals, parameters sets are drawn from a multivariate normal distribution based on the jacobian matrix obtained during parameter optimization. This method to quantify uncertainties has some underlying assumptions on the model residuals (or noise) that should be checked. This notebook only deals with parameter uncertainties and not with model structure uncertainties.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.stats import norm

import pastas as ps

ps.set_log_level("ERROR")
ps.show_versions()
Pastas     : 2.0.0
Python     : 3.14.0
Numpy      : 2.4.6
Pandas     : 3.0.3
Scipy      : 1.17.1
Matplotlib : 3.10.9
Numba      : 0.65.1

Create a model#

We first create a toy model to simulate the groundwater levels in southeastern Austria. We will use this model to illustrate how the different methods for uncertainty quantification can be used.

gwl = (
    pd.read_csv("data_wagna/head_wagna.csv", index_col=0, parse_dates=True, skiprows=2)
    .squeeze()
    .loc["2006":]
    .iloc[0::10]
)
gwl = gwl.loc[~gwl.index.duplicated(keep="first")]

evap = pd.read_csv(
    "data_wagna/evap_wagna.csv", index_col=0, parse_dates=True, skiprows=2
).squeeze()
prec = pd.read_csv(
    "data_wagna/rain_wagna.csv", index_col=0, parse_dates=True, skiprows=2
).squeeze()

# Model settings
tmin = pd.Timestamp("2007-01-01")  # Needs warmup
tmax = pd.Timestamp("2016-12-31")

ml = ps.Model(gwl)
sm = ps.RechargeModel(
    prec, evap, recharge=ps.rch.FlexModel(), rfunc=ps.Exponential(), name="rch"
)
ml.add_stressmodel(sm)

# Add the ARMA(1,1) noise model and solve the Pastas model
ml.add_noisemodel(ps.ArmaNoiseModel())
ml.solve(tmin=tmin, tmax=tmax, report="full")
Fit report GWL                          Fit Statistics
======================================================
nfev     34                     EVP              77.36
nobs     365                    R2                0.77
noise    True                   RMSE              0.18
tmin     2007-01-01 00:00:00    AICc          -2053.02
tmax     2016-12-31 00:00:00    BIC           -2018.42
freq     D                      Obj               0.63
freq_obs None                   ___                   
warmup   3650 days 00:00:00     Interp.             No

Parameters (9 optimized)
======================================================
                optimal     initial   vary      stderr
rch_A          0.638119    0.529381   True     ±13.10%
rch_a         74.655762   10.000000   True     ±15.26%
rch_srmax    364.057456  250.000000   True     ±41.22%
rch_lp         0.250000    0.250000  False         nan
rch_ks       353.703440  100.000000   True    ±153.63%
rch_gamma      3.804077    2.000000   True     ±15.06%
rch_kv         1.283578    1.000000   True     ±18.34%
rch_simax      2.000000    2.000000  False         nan
constant_d   262.598597  263.166264   True  ±3.25e-02%
noise_alpha  111.959374   10.000000   True     ±28.47%
noise_beta     8.261044    1.000000   True     ±14.93%

Parameter correlations |rho| > 0.5
======================================================
rch_A rch_a          0.80
rch_A constant_d    -0.74
rch_a constant_d    -0.70
rch_srmax rch_ks     0.97
rch_srmax rch_gamma  0.58
rch_srmax rch_kv     0.66
rch_ks rch_gamma     0.73
rch_ks rch_kv        0.58

Diagnostic Checks#

Before we perform the uncertainty quantification, we should check if the underlying statistical assumptions are met. We refer to the notebook on Diagnostic checking for more details on this.

ml.plots.diagnostics();
../_images/a6a099e3fccdc7a349ad221381d37d6eb325190df40a339bbcb523cffda277ab.png

Confidence intervals#

After the model is calibrated, a solver attribute is added to the Pastas Model object (ml.solver). This object contains information about the optimizations (e.g., the jacobian matrix) and a number of methods that can be used to quantify uncertainties.

ci = ml.solver.ci_simulation(alpha=0.05, n=1000)
ax = ml.plot(figsize=(10, 3))
ax.fill_between(ci.index, ci.iloc[:, 0], ci.iloc[:, 1], color="lightgray")
ax.legend(["Observations", "Simulation", "95% Confidence interval"], ncol=3, loc=2)
<matplotlib.legend.Legend at 0x7d4cbd7f1d30>
../_images/d5771232e051651525597162d891ecee999774510ee18a2a6fb40855f285de0d.png

Prediction interval#

ci = ml.solver.prediction_interval(n=1000)
ax = ml.plot(figsize=(10, 3))
ax.fill_between(ci.index, ci.iloc[:, 0], ci.iloc[:, 1], color="lightgray")
ax.legend(["Observations", "Simulation", "95% Prediction interval"], ncol=3, loc=2)
<matplotlib.legend.Legend at 0x7d4cc42596a0>
../_images/c0cee4db328fa2bc2027acd2bb466f7e558780fa7d7cc089ab8fb2faf24d6dd1.png

Checking the quality of the prediction interval#

We can compute the PICP to see what percentage of the measurements are within the 95% prediction interval, which should theoretically be around 0.95.

ps.stats.picp(gwl[ci.index[0] : ci.index[-1]], ci)
np.float64(0.9780821917808219)

Uncertainty of step response#

ci = ml.solver.ci_step_response("rch")
ax = ml.plots.step_response(figsize=(6, 2))
ax.fill_between(ci.index, ci.iloc[:, 0], ci.iloc[:, 1], color="lightgray")
ax.legend(["Simulation", "95% Confidence interval"], ncol=3, loc=4)
<matplotlib.legend.Legend at 0x7d4cc444cec0>
../_images/dda731e731003da6b8c5c6a45d5be28d31dffccd1cc03284425ae30328d634d1.png

Uncertainty of block response#

ci = ml.solver.ci_block_response("rch")
ax = ml.plots.block_response(figsize=(6, 2))
ax.fill_between(ci.index, ci.iloc[:, 0], ci.iloc[:, 1], color="lightgray")
ax.legend(["Simulation", "95% Confidence interval"], ncol=3, loc=1)
<matplotlib.legend.Legend at 0x7d4cc42156a0>
../_images/890b132a4c7913f40bfa54baaa947091ed24f5b1ff2700e0481bb441f0f7fd4b.png

Uncertainty of the contributions#

ci = ml.solver.ci_contribution("rch")
r = ml.get_contribution("rch")
ax = r.plot(figsize=(10, 3))
ax.fill_between(ci.index, ci.iloc[:, 0], ci.iloc[:, 1], color="lightgray")
ax.legend(["Simulation", "95% Confidence interval"], ncol=3, loc=1)
plt.tight_layout()
../_images/5b299ed791f3ae51e53ec9b8ca9f0d3cedfc2e5b8ef82fd9c778149fe948884f.png

Custom Confidence intervals#

It is also possible to compute the confidence intervals manually, for example to estimate the uncertainty in the recharge or statistics (e.g., SGI, NSE). We can call ml.solver.get_parameter_sample to obtain random parameter samples from a multivariate normal distribution using the optimal parameters and the covariance matrix. Next, we use the parameter sets to obtain multiple simulations of ‘something’, here the recharge.

params = ml.solver.get_parameter_sample(n=1000, name="rch")
data = {}

# Here we run the model n times with different parameter samples
for i, param in enumerate(params):
    data[i] = ml.stressmodels["rch"].get_stress(p=param)

df = pd.DataFrame.from_dict(data, orient="columns").loc[tmin:tmax].resample("YE").sum()
ci = df.quantile([0.025, 0.975], axis=1).transpose()

r = ml.get_stress("rch").resample("YE").sum()
ax = r.plot.bar(figsize=(10, 2), width=0.5, yerr=[r - ci.iloc[:, 0], ci.iloc[:, 1] - r])
ax.set_xticklabels(labels=r.index.year, rotation=0, ha="center")
ax.set_ylabel("Recharge [mm a$^{-1}$]")
ax.legend(ncol=3);
../_images/4bfa22147484662682994977cf6bfebfda43edd34e15f68c1cb2412046a3eb0e.png

Uncertainty of the NSE#

The code pattern shown above can be used for many types of uncertainty analyses. Another example is provided below, where we compute the uncertainty of the Nash-Sutcliffe efficacy.

params = ml.solver.get_parameter_sample(n=1000)
data = []

# Here we run the model n times with different parameter samples
for i, param in enumerate(params):
    sim = ml.simulate(p=param)
    data.append(ps.stats.nse(obs=ml.observations(), sim=sim))

fig, ax = plt.subplots(1, 1, figsize=(4, 3))
plt.hist(data, bins=50, density=True)
ax.axvline(ml.stats.nse(), linestyle="--", color="k")
ax.set_xlabel("NSE [-]")
ax.set_ylabel("frequency [-]")

mu, std = norm.fit(data)

# Plot the PDF.
xmin, xmax = ax.set_xlim()
x = np.linspace(xmin, xmax, 100)
p = norm.pdf(x, mu, std)
ax.plot(x, p, "k", linewidth=2)
[<matplotlib.lines.Line2D at 0x7d4cbff9dfd0>]
../_images/9795369c30f735609a9d81f0de55522be5c8e58357a8c019b991ea262e56ba38.png