{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "# Example 1: Pastas Cookbook recipe\n", "***\n", "\n", "This notebook is supplementary material to the following article in Groundwater:\n", "\n", "*Collenteur, R.A., Bakker, M., Caljé, R., Klop, S.A. and Schaars, F. (2019), Pastas: Open Source Software for the Analysis of Groundwater Time Series. Groundwater, 57: 877-885. [doi:10.1111/gwat.12925](https://ngwa.onlinelibrary.wiley.com/doi/full/10.1111/gwat.12925)*\n", "\n", "**Please note that the numbers and figures in this Notebook may slightly differ from those in the original publication due to some minor improvements/changes in the software code.**\n", " \n", "In this notebook the Pastas \"cookbook\" recipe is shown. In this example it is investigated how well the heads measured in a borehole near Kingstown, Rhode Island, US, can be simulated using rainfall and potential evaporation. A transfer function noise (TFN) model using impulse response function is created to simulate the observed heads.\n", "\n", "The observed heads are obtained from the Ground-Water Climate Response Network (CRN) of the USGS (https://groundwaterwatch.usgs.gov/). The corresponding USGS site id is 412918071321001. The rainfall data is taken from the Global Summary of the Day dataset (GSOD) available at the National Climatic Data Center (NCDC). The rainfall series is obtained from the weather station in Kingston (station number: NCDC WBAN 54796) located at 41.491$^\\circ$, -71.541$^\\circ$. The evaporation is calculated from the mean temperature obtained from the same USGS station using Thornthwaite's method (Pereira and Pruitt, 2004).\n", "\n", "*Pereira AR, Pruitt WO (2004), Adaptation of the Thornthwaite scheme for estimating daily reference evapotranspiration, Agricultural Water Management 66(3), 251-257*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 1. Importing the python packages\n", "The first step to creating the TFN model is to import the python packages. In this notebook two packages are used, the Pastas package and the Pandas package to import the time series data. Both packages are short aliases for convenience (`ps` for the Pastas package and `pd` for the Pandas package). The other packages that are imported are not needed for the analysis but are needed to make the publication figures." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "\n", "import pastas as ps\n", "\n", "ps.set_log_level(\"ERROR\")\n", "ps.show_versions()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 2. Reading the time series\n", "The next step is to import the time series data. Three series are used in this example; the observed groundwater head, the rainfall and the evaporation. The data can be read using different methods, in this case the Pandas `read_csv` method is used to read the csv files. Each file consists of two columns; a date column called 'Date' and a column containing the values for the time series. The index column is the first column and is read as a date format. The heads series are stored in the variable `obs`, the rainfall in `rain` and the evaporation in `evap`. All variables are transformed to SI-units." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "obs = pd.read_csv(\"obs.csv\", index_col=\"Date\", parse_dates=True).squeeze() * 0.3048\n", "rain = pd.read_csv(\"rain.csv\", index_col=\"Date\", parse_dates=True).squeeze() * 0.3048\n", "rain = rain.asfreq(\"D\", fill_value=0.0) # There are some nan-values present\n", "evap = pd.read_csv(\"evap.csv\", index_col=\"Date\", parse_dates=True).squeeze() * 0.3048" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 3. Creating the model\n", "After reading in the time series, a Pastas Model instance can be created, `Model`. The `Model` instance is stored in the variable `ml` and takes two input arguments; the head time series `obs`, and a model name: \"Kingstown\"." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ml = ps.Model(obs.loc[::14], name=\"Kingstown\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 4. Adding stress models\n", "A `RechargeModel` instance is created and stored in the variable `rm`, taking the rainfall and potential evaporation time series as input arguments, as well as a name and a response function. In this example the Gamma response function is used (the Gamma function is available as `ps.Gamma`). After creation the recharge stress model instance is added to the model. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rm = ps.RechargeModel(rain, evap, name=\"recharge\", rfunc=ps.Gamma())\n", "ml.add_stressmodel(rm)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4a. Adding a noise model\n", "\n", "Required since Pastas 1.5" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nm = ps.ArNoiseModel()\n", "ml.add_noisemodel(nm)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 5. Solving the model\n", "The model parameters are estimated by calling the `solve` method of the `Model` instance. In this case the default settings are used (for all but the tmax argument) to solve the model. Several options can be specified in the `.solve` method, for example; a `tmin` and `tmax` or the type of solver used (this defaults to a least squares solver, `ps.LeastSquares()`). This `solve` method prints a fit report with basic information about the model setup and the results of the model fit." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ml.solve(tmax=\"2014\")\n", "\n", "# Print some information on the model fit for the validation period\n", "print(\n", " \"\\nThe R2 and the RMSE in the validation period are \",\n", " ml.stats.rsq(tmin=\"2015\", tmax=\"2019\").round(2),\n", " \"and\",\n", " ml.stats.rmse(tmin=\"2015\", tmax=\"2019\").round(2),\n", " \", respectively.\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 6. Visualizing the results\n", "The final step of the \"cookbook\" recipe is to visualize the results of the TFN model. The Pastas package has several build in plotting methods, available through the `ml.plots` instance. Here the `.results` plotting method is used. This method plots an overview of the model results, including the simulation and the observations of the groundwater head, the optimized model parameters, the residuals and the noise, the contribution of each stressmodel, and the step response function for each stressmodel. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ml.plots.results(tmax=\"2018\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 7. Diagnosing the noise series\n", "The `diagnostics` plot can be used to interpret how well the noise follows a normal distribution and suffers from autocorrelation (or not)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ml.plots.diagnostics();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Make plots for publication\n", "In the next codeblocks the Figures used in the Pastas paper are created. The following figures are created:\n", "\n", "- Figure of the impulse and step response for the scaled Gamma response function\n", "- Figure of the stresses used in the model\n", "- Figure of the modelfit and the step response\n", "- Figure of the model fit as returned by Pastas\n", "- Figure of the model residuals and noise\n", "- Figure of the Autocorrelation function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make a plot of the impulse and step response for the Gamma and Hantush functions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rfunc = ps.Gamma(cutoff=0.999)\n", "p = [100, 1.5, 15]\n", "b = np.append(0, rfunc.block(p))\n", "s = rfunc.step(p)\n", "\n", "rfunc2 = ps.Hantush(cutoff=0.999)\n", "p2 = [-100, 30, 0.7]\n", "b2 = np.append(0, rfunc2.block(p2))\n", "s2 = rfunc2.step(p2)\n", "\n", "# Make a figure of the step and block response\n", "fig, [ax1, ax2] = plt.subplots(1, 2, sharex=True, figsize=(8, 4))\n", "ax1.plot(b)\n", "ax1.plot(b2)\n", "ax1.set_ylabel(\"block response\")\n", "ax1.set_xlabel(\"days\")\n", "ax1.legend([\"Gamma\", \"Hantush\"], handlelength=1.3)\n", "ax1.axhline(0.0, linestyle=\"--\", c=\"k\")\n", "\n", "ax2.plot(s)\n", "ax2.plot(s2)\n", "ax2.set_xlim(0, 100)\n", "ax2.set_ylim(-105, 105)\n", "ax2.set_ylabel(\"step response\")\n", "ax2.set_xlabel(\"days\")\n", "ax2.axhline(0.0, linestyle=\"--\", c=\"k\")\n", "ax2.annotate(\"\", xy=(95, 100), xytext=(95, 0), arrowprops={\"arrowstyle\": \"<->\"})\n", "ax2.annotate(\"A\", xy=(95, 100), xytext=(85, 50))\n", "ax2.annotate(\"\", xy=(95, -100), xytext=(95, 0), arrowprops={\"arrowstyle\": \"<->\"})\n", "ax2.annotate(\"A\", xy=(95, 100), xytext=(85, -50))\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make a plot of the stresses used in the model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, [ax1, ax2, ax3] = plt.subplots(3, 1, sharex=True, figsize=(8, 7))\n", "\n", "ax1.plot(obs, \"k.\", label=\"obs\", markersize=2)\n", "ax1.set_ylabel(\"head (m)\", labelpad=0)\n", "ax1.set_yticks([-4, -3, -2])\n", "\n", "plot_rain = ax2.plot(rain * 1000, color=\"k\", label=\"prec\", linewidth=1)\n", "ax2.set_ylabel(\"rain (mm/d)\", labelpad=-5)\n", "ax2.set_xlabel(\"Date\")\n", "ax2.set_ylim([0, 150])\n", "ax2.set_yticks(np.arange(0, 151, 50))\n", "\n", "plot_evap = ax3.plot(evap * 1000, \"k\", label=\"evap\", linewidth=1)\n", "ax3.set_ylabel(\"evap (mm/d)\")\n", "ax3.tick_params(\"y\")\n", "ax3.set_ylim([0, 8])\n", "\n", "plt.xlim([pd.Timestamp(\"2003\"), pd.Timestamp(\"2019\")])\n", "ax2.set_xlabel(\"\")\n", "ax3.set_xlabel(\"year\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make a custom figure of the model fit and the estimated step response" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create the main plot\n", "fig, ax = plt.subplots(figsize=(16, 5))\n", "ax.plot(obs, marker=\".\", c=\"grey\", linestyle=\" \")\n", "ax.plot(obs.loc[:\"2013\":14], marker=\"x\", markersize=7, c=\"C3\", linestyle=\" \", mew=2)\n", "ax.plot(ml.simulate(tmax=\"2019\"), c=\"k\")\n", "plt.ylabel(\"head (m)\")\n", "plt.xlabel(\"year\")\n", "plt.title(\"\")\n", "plt.xlim(pd.Timestamp(\"2003\"), pd.Timestamp(\"2019\"))\n", "plt.ylim(-4.7, -1.6)\n", "plt.yticks(np.arange(-4, -1, 1))\n", "\n", "# Create the arrows indicating the calibration and validation period\n", "ax.annotate(\n", " \"calibration period\",\n", " xy=(pd.Timestamp(\"2003-01-01\"), -4.6),\n", " xycoords=\"data\",\n", " xytext=(300, 0),\n", " textcoords=\"offset points\",\n", " arrowprops=dict(arrowstyle=\"->\"),\n", " va=\"center\",\n", " ha=\"center\",\n", ")\n", "ax.annotate(\n", " \"\",\n", " xy=(pd.Timestamp(\"2014-01-01\"), -4.6),\n", " xycoords=\"data\",\n", " xytext=(-230, 0),\n", " textcoords=\"offset points\",\n", " arrowprops=dict(arrowstyle=\"->\"),\n", " va=\"center\",\n", " ha=\"center\",\n", ")\n", "\n", "ax.annotate(\n", " \"validation\",\n", " xy=(pd.Timestamp(\"2014-01-01\"), -4.6),\n", " xycoords=\"data\",\n", " xytext=(150, 0),\n", " textcoords=\"offset points\",\n", " arrowprops=dict(arrowstyle=\"->\"),\n", " va=\"center\",\n", " ha=\"center\",\n", ")\n", "ax.annotate(\n", " \"\",\n", " xy=(pd.Timestamp(\"2019-01-01\"), -4.6),\n", " xycoords=\"data\",\n", " xytext=(-85, 0),\n", " textcoords=\"offset points\",\n", " arrowprops=dict(arrowstyle=\"->\"),\n", " va=\"center\",\n", " ha=\"center\",\n", ")\n", "\n", "plt.legend(\n", " [\"observed head\", \"used for calibration\", \"simulated head\"], loc=2, numpoints=3\n", ")\n", "\n", "# Create the inset plot with the step response\n", "ax2 = plt.axes([0.66, 0.65, 0.22, 0.2])\n", "s = ml.get_step_response(\"recharge\")\n", "ax2.plot(s, c=\"k\")\n", "ax2.set_ylabel(\"response\")\n", "ax2.set_xlabel(\"days\", labelpad=-15)\n", "ax2.set_xlim(0, s.index.size)\n", "ax2.set_xticks([0, 300])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make a figure of the fit report" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from matplotlib.font_manager import FontProperties\n", "\n", "font = FontProperties()\n", "# font.set_size(10)\n", "font.set_weight(\"normal\")\n", "font.set_family(\"monospace\")\n", "# font.set_name(\"courier new\")\n", "\n", "plt.text(-1, -1, str(ml.fit_report()), fontproperties=font)\n", "plt.axis(\"off\")\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make a Figure of the noise, residuals and autocorrelation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax1 = plt.subplots(1, 1, figsize=(8, 3))\n", "ml.residuals(tmax=\"2019\").plot(ax=ax1, c=\"k\")\n", "ml.noise(tmax=\"2019\").plot(ax=ax1, c=\"C0\")\n", "plt.xticks(rotation=0, horizontalalignment=\"center\")\n", "ax1.set_ylabel(\"(m)\")\n", "ax1.set_xlabel(\"year\")\n", "ax1.legend([\"residuals\", \"noise\"], ncol=2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ax = ps.plots.acf(ml.noise(), figsize=(9, 2))\n", "ax.set_ylabel(\"ACF (-)\")\n", "ax.set_xlabel(\"lag (days)\")\n", "plt.legend([\"95% confidence interval\"], loc=(0.0, 1.05))\n", "plt.xlim(0, 370)\n", "plt.ylim(-0.25, 0.25)\n", "plt.title(\"\")\n", "plt.grid()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ml.stats.diagnostics()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "q, p = ps.stats.stoffer_toloi(ml.noise())\n", "print(\"The hypothesis that there is significant autocorrelation is:\", p)" ] } ], "metadata": { "kernelspec": { "display_name": "pastas (3.13.5)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.5" } }, "nbformat": 4, "nbformat_minor": 4 }