\n",
" \n",
"**Rule-of-thumb:** \n",
" \n",
"when the standard errors of the parameters are used, the model residuals should be diagnostically checked.\n",
" \n",
"

\n",
"\n",
"### What to check: assumptions of white noise\n",
"The methods used for the estimation of these standard errors assume that the model residuals behave as white noise with a mean of zero and noise values that are independent from each other (i.e., no significant autocorrelation). Additionally, it is often assumed that the residuals are homoscedastic (i.e., have a constant variance) and follow a normal distribution. The first two assumptions are the most important, having the largest impact on the estimated standard errors of the parameters [(Hipel & McLeod, 2005)](#References). Additionally to these four assumptions, the model residuals should be uncorrelated with any of the input time series. If the residuals are found to behave as white noise, we may assume that the standard errors of the parameters have been accurately estimated and we may use them for inferential analyses.\n",
"\n",
"### How to check: visualization & hypothesis testing\n",
"The assumptions outlined above may be checked through different types of visualization and hypothesis testing of the model residuals. For the latter, statistical tests are used to test the hypothesis that the residuals are e.g., independent, homoscedastic, or normally distributed. These tests typically test a hypothesis with some version of the following Null hypothesis ($H_0$) and the Alternative hypothesis ($H_A$):\n",
"\n",
"- $H_0$: The residuals are independent, homoscedastic, or normally distributed\n",
"- $H_A$: The residuals are not independent, homoscedastic, or normally distributed\n",
"\n",
"All hypothesis tests compute a certain test statistic (e.g., $Q_{test}$), which is compared to a theoretical value according to a certain distribution (e.g., $\\chi^2_{\\alpha, h}$) that depends on the level of significance (e.g., $\\alpha=0.05$) and sometimes the degrees of freedom $h$. The result of a hypothesis test either rejects the Null hypothesis or fails to reject the Null hypothesis, but can never be used to accept the Alternative hypothesis. For example, if an hypothesis test for autocorrelation fails to reject the hypothesis we may conclude that there is no significant autocorrelation in the residuals, but cannot conclude that there is no autocorrelation. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import pastas as ps\n",
"from scipy import stats\n",
"\n",
"import matplotlib.pyplot as plt\n",
"\n",
"ps.set_log_level(\"ERROR\")\n",
"ps.show_versions()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create and calibrate a pastas Model\n",
"To illustrate how to perform diagnostic checking of a Pastas model, a simple model using precipitation and evaporation to simulate the groundwater levels is created. The model is calibrated using a noisemodel with one parameter. Finally, a plot is created using `ml.plots.results()` that shows the simulated groundwater levels, the model residuals and noise and the calibrated parameters values and their estimated standard errors. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Import groundwater, rainfall and evaporation time series\n",
"head = pd.read_csv(\n",
" \"data/head_nb1.csv\", parse_dates=[\"date\"], index_col=\"date\"\n",
").squeeze()\n",
"rain = pd.read_csv(\n",
" \"data/rain_nb1.csv\", parse_dates=[\"date\"], index_col=\"date\"\n",
").squeeze()\n",
"evap = pd.read_csv(\n",
" \"data/evap_nb1.csv\", parse_dates=[\"date\"], index_col=\"date\"\n",
").squeeze()\n",
"\n",
"ml = ps.Model(head)\n",
"sm = ps.RechargeModel(rain, evap, rfunc=ps.Exponential(), name=\"rch\")\n",
"ml.add_stressmodel(sm)\n",
"ml.solve(report=False)\n",
"axes = ml.plots.results(figsize=(10, 5))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diagnostics checking of Pastas models\n",
"Let's say we want to plot the 95% confidence intervals of the simulated groundwater levels that results from uncertainties in the calibrated parameters. Such an analysis would clearly use the standard errors of the parameters, and before we proceed to compute any confidence intervals we should check if the modeled noise agrees with the assumptions of white noise. A noise model was used during calibrations and therefore the noise returned by the `ml.noise()` method should be tested on these assumptions.\n",
"\n",
"### ml.plots.diagnostics\n",
"To quickly diagnose the noise series on the different assumptions of white noise, the noise series may be visualized using the `ml.plots.diagnostics()` method. This method visualizes the noise series in different ways to test the different assumptions. The method will internally check if a noise model was used during parameter calibration, and select the appropriate residuals series from `ml.residuals()` or `ml.noise()`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"alpha = 0.05\n",
"\n",
"ml.plots.diagnostics();"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The top-left plot shows the noise time series, which should look more or less random without a clear (seasonal) trend. The title of this plot also includes the number of observations $n$ and the mean value of the modeled noise $\\mu$, which should be around zero. The bottom-left plot shows the autocorrelations for lags up to one year and the 95% confidence interval. Approximately 95% of the autocorrelation values should fall between these boundaries. The upper-right plot shows a histogram of the noise series along with a normal distribution fitted to the data. This plot may be used to assess how well the noise series resemble a normal distribution. The bottom-right plot may also be used to assess the normality of the noise series, using a probability plot of the data.\n",
"\n",
"### ml.stats.diagnostics\n",
"The visual interpretation of the noise series is (clearly) subjective, but still provides a powerful tool to test the noise series and quickly identify any violations of the assumption of white nose. For a more objective evaluation of the model assumptions, hypothesis tests may be used. To perform multiple hypothesis tests on the noise series at once, Pastas provides the `ml.stats.diagnostics()` method as follows."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ml.stats.diagnostics(alpha=0.05)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `ml.stats.diagnostics` method returns a Pandas DataFrame with an overview of the results of the different hypothesis tests. The first column (\"Checks\") reports what assumption is tested by a certain test and the second column (\"Statistic\") reports the test statistic that is computed for that test. The probability of each test statistic is reported in the third column (\"P-value\") and the fourth column (\"Reject H0\") reports the result of the hypothesis test. Recall that the Null-hypotheses assume that the data resembles white noise. This means that if $H_0$ is rejected (or `Reject H0 = True`), that test concludes that the data does not agree with one of the assumptions of white noise. The following table provides an overview of the different hypothesis tests that are reported by `ml.stats.diagnostics()`.\n",
"\n",
"\n",
"| Name | Checks | Pastas/Scipy method | Description | Non-equidistant |\n",
"|:-----|:-----|:--------------|:-----------------------------------------|----------------:|\n",
"| Shapiro-Wilk | Normality | `scipy.stats.shapiro`| The Shapiro-Wilk test tests the null hypothesis that the data was drawn from a normal distribution. | Unknown |\n",
"| D'Agostino | Normality | `scipy.stats.normaltest`| This test checks if the noise series comes from a normal distribution (H0 hypothesis). | Unknown |\n",
"| Ljung-Box test| Autocorrelation | `ps.stats.ljung_box`| This test checks whether the autocorrelations of a time series are significantly different from zero.| No |\n",
"| Durbin-Watson test | Autocorrelation | `ps.stats.durbin_watson` | This tests diagnoses for autocorrelation at a lag of one time step. | No |\n",
"| Stoffer-Toloi test | Autocorrelation | `ps.stats.stoffer_toloi`| This test is similar to the Ljung-Box test, but is adapted for missing values | Yes |\n",
"| Runs test | Autocorrelation | `ps.stats.runs_test` | This test checks whether the values of a time series are random without assuming any probability distribution. | Yes |\n",
"\n",
"Some of the test are known to be appropriate to test time series with non-equidistant time steps while others are not. The method `ml.stats.diagnostics()` will check use different test depending on the existence of non-equidistant time steps or not. All methods are also available as separate methods and may also be used to test time series that are not obtained from a Pastas model.\n",
"\n",
"## A closer look at the hypothesis tests\n",
"While the results of `ml.stats.diagnostics` may look straightforward, the interpretation is unfortunately not because the results are highly dependent on the input data. To correctly interpret the hypothesis tests it is particularly important to know whether or not the noise time series have equidistant time steps and how many observations the time series contains. For example, some of the tests are only valid when used on equidistant time series. Other tests are sensitive to too few observation (e.g., Ljung-Box) or too many observations (e.g., Shapiroo-Wilk). In the following sections each of these hypothesis tests is discussed in more detail. To show the functioning of the different hypothesis tests a synthetic time series is created by randomly drawing values from a normal distribution."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"random_seed = np.random.RandomState(12345)\n",
"index = pd.to_datetime(np.arange(3650), unit=\"D\")\n",
"noise = pd.Series(random_seed.normal(0, 1, len(index)), index=index)\n",
"noise.plot(figsize=(12, 2))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Checking for autocorrelation\n",
"The first thing we check is if the values of residual series are independent form each other, or in other words, are not correlated. The correlation of a time series with a lagged version of itself is also referred to as autocorrelation, and we often say that we want to check that there is no significant autocorrelation in the residual time series. The following methods to test for autocorrelation are available in Pastas:\n",
"\n",
"| Name | Pastas method | Description | Non-equidistant |\n",
"|:-----|:--------------|:-----------------------------------------|----------------:|\n",
"| Visualization | `ps.plots.acf` | Visualization of the autocorrelation and its confidence intervals. | Yes |\n",
"| Ljung-Box test| `ps.stats.ljung_box`| This test checks whether the autocorrelations of a time series are significantly different from zero.| No |\n",
"| Durbin-Watson test | `ps.stats.durbin_watson` | This tests diagnoses for autocorrelation at a lag of one time step. | No |\n",
"| Stoffer-Toloi test |`ps.stats.stoffer_toloi`| This test is similar to the Ljung-Box test, but is adapted for missing values | Yes |\n",
"| Runs test |`ps.stats.runs_test` | This test checks whether the values of a time series are random without assuming any probability distribution. | Yes |\n",
"\n",
"Whereas many time series models have equidistant time steps, the residuals of Pastas models may have non-equidistant time steps. To deal with this property, functions have been implemented in Pastas that can deal with non-equidistant time steps [(Rehfeld et al., 2011)](#References). We therefore recommend to use the statistical methods supplied in Pastas, unless the modeler is sure he/she is dealing with equidistant time steps. See the additional Notebook on the autocorrelation function for more details and a proof of concept.\n",
"\n",
"### Visual interpretation of the autocorrelation\n",
"To diagnose the model residuals for autocorrelation we first plot the autocorrelation function (ACF) using the `ps.plots.acf` method and perform a visual interpretation of the models residuals. The created plot shows the autocorrelation function up to a time lag of 250 days. The blue-shaded area denotes the 95\\% confidence interval (1-$\\alpha$). If 95\\% of the autocorrelations fall within this confidence interval (that is, 0.95 $\\cdot$ 250 = ±237 of them), we may conclude that there is no significant autocorrelation in the residuals. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ax = ps.plots.acf(noise, acf_options=dict(bin_width=0.5), figsize=(10, 3), alpha=0.01)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The number of time lags to control for autocorrelation has to be chosen by the modeler, for example based on the knowledge of hydrological processes. For example, evaporation shows a clear yearly cycle and we may expect autocorrelation at lags of one year as a result of this. We therefore recommend to test for autocorrelation for all lags up to a lag of $k_{max}=365$ days here. It is noted here the number of lags $k$ [-] to calculate the autocorrelation for may depend on the time step of the residuals ($\\Delta t$). For example, if daily residuals are available ($\\Delta t = 1$ day), the autocorrelation has to be computed for $k=365$ [-] lags.\n",
"\n",
"### Tests for autocorrelation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"stat, p = ps.stats.ljung_box(noise, lags=15)\n",
"\n",
"if p > alpha:\n",
" print(\n",
" \"Failed to reject the Null-hypothesis, no significant autocorrelation. p =\",\n",
" p.round(2),\n",
" )\n",
"else:\n",
" print(\"Reject the Null-hypothesis. p =\", p.round(2))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dw_stat = ps.stats.durbin_watson(noise)\n",
"\n",
"print(stat)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"stat, p = ps.stats.stoffer_toloi(noise, lags=15, freq=\"D\")\n",
"\n",
"if p > alpha:\n",
" print(\n",
" \"Failed to reject the Null-hypothesis, no significant autocorrelation. p =\",\n",
" p.round(2),\n",
" )\n",
"else:\n",
" print(\"Reject the Null-hypothesis\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"stat, p = ps.stats.runs_test(noise)\n",
"\n",
"if p > alpha:\n",
" print(\n",
" \"Failed to reject the Null-hypothesis, no significant autocorrelation. p =\",\n",
" p.round(2),\n",
" )\n",
"else:\n",
" print(\"Reject the Null-hypothesis\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Checking for Normality\n",
"A common assumption is that the residuals follow a Normal distribution, although in principle it is also possible that the residuals come from another distribution. Testing whether or not a time series may come from a normal distribution is notoriously difficult, especially for larger sample size (e.g., more groundwater level observations). It may therefore not always be easy to objectively determine whether or not the residuals follow a normal distribution. An good initial method to assess the normality of the residuals is to plot a histogram of the residuals and compare that to the theoretical normal distribution, along with a probability plot. The following methods may be used to check the normality of the residual series:\n",
"\n",
"\n",
"| Name | Scipy method | Description | Non-equidistant Time series |\n",
"|:-----|:--------------|:------------|----------------:|\n",
"| Histogram plot | `numpy.histogram` | Plot a histogram of the residuals time series and compare to a normal distribution. | Unknown |\n",
"| Probability plot | `scipy.stats.probplot`| Plot a histogram of the residuals time series and compare to a normal distribution. | Unknown |\n",
"| Shapiro-Wilk |`scipy.stats.shapiro`| The Shapiro-Wilk test tests the null hypothesis that the data was drawn from a normal distribution. | Unknown |\n",
"| D'Agostino |`scipy.stats.normaltest`| This test checks if the noise series comes from a normal distribution (H0 hypothesis). | Unknown |\n",
"\n",
"Shapiro and Wilk [(1965)](#References) developed a test to test if a time series may come from a normal distribution. Implemented in Scipy as `scipy.stats.shapiro`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"stat, p = stats.shapiro(noise)\n",
"\n",
"if p > alpha:\n",
" print(\n",
" \"Failed to reject the Null-hypothesis, residuals may come from Normal distribution. p =\",\n",
" np.round(p, 2),\n",
" )\n",
"else:\n",
" print(\"Reject the Null-hypothesisp =\", np.round(p, 2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"D'Agostino and Pearson [(1973)](#References) developed a test to detect non-normality of a time series. This test is implemented in Scipy as `scipy.stats.normaltest`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"stat, p = stats.normaltest(noise)\n",
"\n",
"if p > alpha:\n",
" print(\n",
" \"Failed to reject the Null-hypothesis, residuals may come from Normal distribution. p =\",\n",
" p.round(2),\n",
" )\n",
"else:\n",
" print(\"Reject the Null-hypothesis. p =\", p.round(2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As the p-value is larger than $\\alpha=0.05$ it is possible that the noise series comes from a normal distribution, so the Null hypothesis (series comes from a normal distribution) is not rejected."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Checking for homoscedasticity\n",
"The second assumption we check is if the residuals are so-called homoscedastic, which means that the values of the residuals are independent of the observed groundwater levels. \n",
"The following tests for homoscedasticity are available:\n",
"\n",
"| Name | Pastas method | Description | Non-equidistant |\n",
"|:-----|:--------------|:----------------------------------|----------------:|\n",
"|Visualization | | Visualization of residuals| Unknown|\n",
"|Engle test| Unavailable | |Unknown|\n",
"|Breusch-Pagaan test| Unavailable | |Unknown|"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.plot(ml.observations(), ml.noise(), marker=\"o\", linestyle=\" \")\n",
"plt.xlabel(\"Simulated Groundwater level [m]\")\n",
"plt.ylabel(\"Model residual [m]\");"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Testing on non-equidistant residuals time series\n",
"A time series with non-equidistant time steps is created from the synthetic time series. The original time series is resampled using the indices from a observed groundwater level time series with different observation frequencies."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"random_seed = np.random.RandomState(12345)\n",
"index = pd.to_datetime(np.arange(4.5 * 3650), unit=\"D\")\n",
"noise_long = pd.Series(random_seed.normal(0, 1, len(index)), index=index).loc[\"1990\":]\n",
"\n",
"index = (\n",
" pd.read_csv(\"data/test_index.csv\", parse_dates=True, index_col=0)\n",
" .index.round(\"D\")\n",
" .drop_duplicates()\n",
")\n",
"noise_irregular = noise_long.reindex(index).dropna()\n",
"\n",
"noise_long.plot(figsize=(12, 2), label=\"equidistant time steps\")\n",
"noise_irregular.plot(label=\"non-equidistant time steps\")\n",
"plt.legend(ncol=2);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's run `ps.stats.diagnostics` on both of these time series and look at the differences in the outcomes:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ps.stats.diagnostics(noise_long, nparam=0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ps.stats.diagnostics(noise_irregular, nparam=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diagnostic vs. hydrological checking\n",
"The diagnostic checks presented in this Notebook are only part of the checks that could be performed before using a model for different purposes. It is noted here that these checks are part of a larger ranges of checks that may be performed on a Pastas model. We also highly recommend checking the model results using hydrological insights and expert judgment. An additional notebook showing this kind of checks will be added in the future.\n",
"\n",
"## Open Questions\n",
"- How well do the tests for normality and homoscedasticity work for time series with non-equidistant time steps?\n",
"- Could we use the ACF for irregular time steps in combination with Ljung-Box?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## References\n",
"\n",
"- Hipel, K. W., & McLeod, A. I. (1994). Time series modelling of water resources and environmental systems, [Chaper 7: Diagnostic Checking.](http://fisher.stats.uwo.ca/faculty/aim/1994Book/1994-Time-chapter%207.pdf) Elsevier.\n",
"- Ljung, G. and Box, G. (1978). [On a Measure of Lack of Fit in Time Series Models, Biometrika, 65, 297-303.](http://dx.doi.org/10.1093/biomet/65.2.297)\n",
"- Stoffer, D. S., & Toloi, C. M. (1992). [A note on the Ljung—Box—Pierce portmanteau statistic with missing data](https://www.sciencedirect.com/science/article/pii/016771529290112I). Statistics & probability letters, 13(5), 391-396.\n",
"- Durbin, J., & Watson, G. S. (1951). [Testing for serial correlation in least squares regression](https://www.jstor.org/stable/2332325). II. Biometrika, 38(1/2), 159-177.\n",
"- Wald, A., & Wolfowitz, J. (1943). [An exact test for randomness in the non-parametric case based on serial correlation](https://www.jstor.org/stable/pdf/2235925.pdf). The Annals of Mathematical Statistics, 14(4), 378-388.\n",
"- D'Agostino, R. and Pearson, E. S. (1973). [Tests for departure from normality](https://www.jstor.org/stable/2335012), Biometrika, 60, 613-622.\n",
"- Shapiro, S. S., & Wilk, M. B. (1965). [An analysis of variance test for normality (complete samples)](https://www.jstor.org/stable/pdf/2333709.pdf). Biometrika, 52(3/4), 591-611.\n",
"- Rehfeld, K., Marwan, N., Heitzig, J., & Kurths, J. (2011). [Comparison of correlation analysis techniques for irregularly sampled time series](https://npg.copernicus.org/articles/18/389/2011/). Nonlinear Processes in Geophysics, 18(3), 389-404.\n",
"\n",
"## Benchmarking built-in Pastas Methods to Statsmodels methods\n",
"The following code blocks may be used to verify the output from Pastas methods to Statsmodels methods."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# import statsmodels.api as sm\n",
"\n",
"# print(\"Pastas:\", ps.stats.ljung_box(noise_long, lags=15))\n",
"# print(\"Pastas StofferToloi:\", ps.stats.stoffer_toloi(noise_long, lags=15))\n",
"# print(\"Statsmodels:\", sm.stats.acorr_ljungbox(noise_long, lags=[15], return_df=False))\n",
"\n",
"# acf = sm.tsa.acf(noise_long, unbiased=True, fft=True, nlags=15)[1:]\n",
"# q, p = sm.tsa.q_stat(acf, noise.size)\n",
"# print(\"Statsmodels:\", q[-1], p[-1])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# print(\"Pastas:\", ps.stats.durbin_watson(noise)[0].round(2))\n",
"# print(\"Statsmodels:\", sm.stats.durbin_watson(noise).round(2))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# print(\"Pastas:\", ps.stats.runs_test(noise))\n",
"# print(\"Statsmodels:\", sm.stats.runstest_1samp(noise))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "pastas_dev",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.15 | packaged by conda-forge | (main, Nov 22 2022, 08:41:22) [MSC v.1929 64 bit (AMD64)]"
},
"vscode": {
"interpreter": {
"hash": "29475f8be425919747d373d827cb41e481e140756dd3c75aa328bf3399a0138e"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}