{ "cells": [ { "cell_type": "markdown", "id": "ed9008bd", "metadata": {}, "source": [ "# Uncertainty quantification\n", "*R.A. Collenteur, University of Graz, WIP (May-2021)*\n", "\n", "In this notebook it is shown how to compute the uncertainty of the model simulation using the built-in uncertainty quantification options of Pastas. \n", "\n", "- Confidence interval of simulation\n", "- Prediction interval of simulation\n", "- Confidence interval of step response\n", "- Confidence interval of block response\n", "- Confidence interval of contribution\n", "- Custom confidence intervals\n", "\n", "The compute the confidence intervals, parameters sets are drawn from a multivariate normal distribution based on the jacobian matrix obtained during parameter optimization. This method to quantify uncertainties has some underlying assumptions on the model residuals (or noise) that should be checked. This notebook only deals with parameter uncertainties and not with model structure uncertainties." ] }, { "cell_type": "code", "execution_count": null, "id": "985c67e7", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "from scipy.stats import norm\n", "\n", "import pastas as ps\n", "\n", "ps.set_log_level(\"ERROR\")\n", "ps.show_versions()" ] }, { "cell_type": "markdown", "id": "00e9bf80", "metadata": {}, "source": [ "## Create a model\n", "\n", "We first create a toy model to simulate the groundwater levels in southeastern Austria. We will use this model to illustrate how the different methods for uncertainty quantification can be used. " ] }, { "cell_type": "code", "execution_count": null, "id": "207cbe07", "metadata": {}, "outputs": [], "source": [ "gwl = (\n", " pd.read_csv(\"data_wagna/head_wagna.csv\", index_col=0, parse_dates=True, skiprows=2)\n", " .squeeze()\n", " .loc[\"2006\":]\n", " .iloc[0::10]\n", ")\n", "gwl = gwl.loc[~gwl.index.duplicated(keep=\"first\")]\n", "\n", "evap = pd.read_csv(\n", " \"data_wagna/evap_wagna.csv\", index_col=0, parse_dates=True, skiprows=2\n", ").squeeze()\n", "prec = pd.read_csv(\n", " \"data_wagna/rain_wagna.csv\", index_col=0, parse_dates=True, skiprows=2\n", ").squeeze()\n", "\n", "# Model settings\n", "tmin = pd.Timestamp(\"2007-01-01\") # Needs warmup\n", "tmax = pd.Timestamp(\"2016-12-31\")\n", "\n", "ml = ps.Model(gwl)\n", "sm = ps.RechargeModel(\n", " prec, evap, recharge=ps.rch.FlexModel(), rfunc=ps.Exponential(), name=\"rch\"\n", ")\n", "ml.add_stressmodel(sm)\n", "\n", "# Add the ARMA(1,1) noise model and solve the Pastas model\n", "ml.add_noisemodel(ps.ArmaNoiseModel())\n", "ml.solve(tmin=tmin, tmax=tmax, report=\"full\")" ] }, { "cell_type": "markdown", "id": "4fe52f9b", "metadata": {}, "source": [ "## Diagnostic Checks\n", "\n", "Before we perform the uncertainty quantification, we should check if the underlying statistical assumptions are met. We refer to the notebook on Diagnostic checking for more details on this." ] }, { "cell_type": "code", "execution_count": null, "id": "eb6e620d", "metadata": {}, "outputs": [], "source": [ "ml.plots.diagnostics();" ] }, { "cell_type": "markdown", "id": "26c33c5c", "metadata": {}, "source": [ "## Confidence intervals\n", "\n", "After the model is calibrated, a `solver` attribute is added to the Pastas `Model` object (`ml.solver`). This object contains information about the optimizations (e.g., the jacobian matrix) and a number of methods that can be used to quantify uncertainties." ] }, { "cell_type": "code", "execution_count": null, "id": "b46d9653", "metadata": {}, "outputs": [], "source": [ "ci = ml.solver.ci_simulation(alpha=0.05, n=1000)\n", "ax = ml.plot(figsize=(10, 3))\n", "ax.fill_between(ci.index, ci.iloc[:, 0], ci.iloc[:, 1], color=\"lightgray\")\n", "ax.legend([\"Observations\", \"Simulation\", \"95% Confidence interval\"], ncol=3, loc=2)" ] }, { "cell_type": "markdown", "id": "29d450c6", "metadata": {}, "source": [ "## Prediction interval" ] }, { "cell_type": "code", "execution_count": null, "id": "8ed06690", "metadata": {}, "outputs": [], "source": [ "ci = ml.solver.prediction_interval(n=1000)\n", "ax = ml.plot(figsize=(10, 3))\n", "ax.fill_between(ci.index, ci.iloc[:, 0], ci.iloc[:, 1], color=\"lightgray\")\n", "ax.legend([\"Observations\", \"Simulation\", \"95% Prediction interval\"], ncol=3, loc=2)" ] }, { "cell_type": "markdown", "id": "7b9f2e54", "metadata": {}, "source": [ "## Uncertainty of step response " ] }, { "cell_type": "code", "execution_count": null, "id": "6a89f6c1", "metadata": {}, "outputs": [], "source": [ "ci = ml.solver.ci_step_response(\"rch\")\n", "ax = ml.plots.step_response(figsize=(6, 2))\n", "ax.fill_between(ci.index, ci.iloc[:, 0], ci.iloc[:, 1], color=\"lightgray\")\n", "ax.legend([\"Simulation\", \"95% Prediction interval\"], ncol=3, loc=4)" ] }, { "cell_type": "markdown", "id": "542b1929", "metadata": {}, "source": [ "## Uncertainty of block response" ] }, { "cell_type": "code", "execution_count": null, "id": "ca6e3cdc", "metadata": {}, "outputs": [], "source": [ "ci = ml.solver.ci_block_response(\"rch\")\n", "ax = ml.plots.block_response(figsize=(6, 2))\n", "ax.fill_between(ci.index, ci.iloc[:, 0], ci.iloc[:, 1], color=\"lightgray\")\n", "ax.legend([\"Simulation\", \"95% Prediction interval\"], ncol=3, loc=1)" ] }, { "cell_type": "markdown", "id": "dc9700e4", "metadata": {}, "source": [ "## Uncertainty of the contributions" ] }, { "cell_type": "code", "execution_count": null, "id": "fe9763d3", "metadata": {}, "outputs": [], "source": [ "ci = ml.solver.ci_contribution(\"rch\")\n", "r = ml.get_contribution(\"rch\")\n", "ax = r.plot(figsize=(10, 3))\n", "ax.fill_between(ci.index, ci.iloc[:, 0], ci.iloc[:, 1], color=\"lightgray\")\n", "ax.legend([\"Simulation\", \"95% Prediction interval\"], ncol=3, loc=1)\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "id": "6dff9e0d", "metadata": {}, "source": [ "## Custom Confidence intervals\n", "It is also possible to compute the confidence intervals manually, for example to estimate the uncertainty in the recharge or statistics (e.g., SGI, NSE). We can call `ml.solver.get_parameter_sample` to obtain random parameter samples from a multivariate normal distribution using the optimal parameters and the covariance matrix. Next, we use the parameter sets to obtain multiple simulations of 'something', here the recharge." ] }, { "cell_type": "code", "execution_count": null, "id": "4912c10f", "metadata": {}, "outputs": [], "source": [ "params = ml.solver.get_parameter_sample(n=1000, name=\"rch\")\n", "data = {}\n", "\n", "# Here we run the model n times with different parameter samples\n", "for i, param in enumerate(params):\n", " data[i] = ml.stressmodels[\"rch\"].get_stress(p=param)\n", "\n", "df = pd.DataFrame.from_dict(data, orient=\"columns\").loc[tmin:tmax].resample(\"A\").sum()\n", "ci = df.quantile([0.025, 0.975], axis=1).transpose()\n", "\n", "r = ml.get_stress(\"rch\").resample(\"A\").sum()\n", "ax = r.plot.bar(figsize=(10, 2), width=0.5, yerr=[r - ci.iloc[:, 0], ci.iloc[:, 1] - r])\n", "ax.set_xticklabels(labels=r.index.year, rotation=0, ha=\"center\")\n", "ax.set_ylabel(\"Recharge [mm a$^{-1}$]\")\n", "ax.legend(ncol=3);" ] }, { "cell_type": "markdown", "id": "ed5c69b7", "metadata": {}, "source": [ "## Uncertainty of the NSE\n", "The code pattern shown above can be used for many types of uncertainty analyses. Another example is provided below, where we compute the uncertainty of the Nash-Sutcliffe efficacy." ] }, { "cell_type": "code", "execution_count": null, "id": "89fd060b", "metadata": {}, "outputs": [], "source": [ "params = ml.solver.get_parameter_sample(n=1000)\n", "data = []\n", "\n", "# Here we run the model n times with different parameter samples\n", "for i, param in enumerate(params):\n", " sim = ml.simulate(p=param)\n", " data.append(ps.stats.nse(obs=ml.observations(), sim=sim))\n", "\n", "fig, ax = plt.subplots(1, 1, figsize=(4, 3))\n", "plt.hist(data, bins=50, density=True)\n", "ax.axvline(ml.stats.nse(), linestyle=\"--\", color=\"k\")\n", "ax.set_xlabel(\"NSE [-]\")\n", "ax.set_ylabel(\"frequency [-]\")\n", "\n", "mu, std = norm.fit(data)\n", "\n", "# Plot the PDF.\n", "xmin, xmax = ax.set_xlim()\n", "x = np.linspace(xmin, xmax, 100)\n", "p = norm.pdf(x, mu, std)\n", "ax.plot(x, p, \"k\", linewidth=2)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" } }, "nbformat": 4, "nbformat_minor": 5 }