R.A. Collenteur, University of Graz, May 2020
In this notebook the classical Autoregressive AR1 noise model is tested for Pastas models. This noise model is tested against synthetic data generated with Numpy or Statsmodels’ ARMA model. This noise model is tested on head time series with a regular time step.
[1]:
import numpy as np import pandas as pd import matplotlib.pyplot as plt import pastas as ps ps.set_log_level("ERROR") ps.show_versions(numba=True)
Python version: 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 02:37:09) [Clang 10.0.1 ] Numpy version: 1.18.5 Scipy version: 1.4.0 Pandas version: 1.1.2 Pastas version: 0.16.0b Matplotlib version: 3.1.3 numba version: 0.49.1
The following formula is used to calculate the noise according to the ARMA(1,1) process:
where \(\upsilon\) is the noise, \(\Delta t_i\) the time step between the residuals (\(r\)), and \(\alpha\) [days] the AR parameter of the model. The model is named NoiseModel and can be found in noisemodel.py. This is the default noise model that is added to a Pastas model, but it can also be added manually as follows: ml.add_noisemodel(ps.NoiseModel())
NoiseModel
noisemodel.py
ml.add_noisemodel(ps.NoiseModel())
[2]:
# Read in some data rain = ps.read.read_knmi('../examples/data/etmgeg_260.txt', variables='RH').series evap = ps.read.read_knmi('../examples/data/etmgeg_260.txt', variables='EV24').series # Set the True parameters Atrue = 800 ntrue = 1.4 atrue = 200 dtrue = 20 # Generate the head step = ps.Gamma().block([Atrue, ntrue, atrue], cutoff=0.9999) h = dtrue * np.ones(len(rain) + step.size) for i in range(len(rain)): h[i:i + step.size] += rain[i] * step head = pd.DataFrame(index=rain.index, data=h[:len(rain)],) head = head['1990':'2015'] # Plot the head without noise plt.figure(figsize=(10,2)) plt.plot(head,'k.', label='head') plt.legend(loc=0) plt.ylabel('Head (m)') plt.xlabel('Time (years)');
In the following code-block, noise is generated using an AR(1) process using Numpy.
[3]:
# reproduction of random numbers np.random.seed(1234) alpha= 0.8 # generate samples using Numpy random_seed = np.random.RandomState(1234) noise = random_seed.normal(0,1,len(head)) * np.std(head.values) * 0.2 a = np.zeros_like(head[0]) for i in range(1, noise.size): a[i] = noise[i] + a[i - 1] * alpha head_noise = head[0] + a
[4]:
ml = ps.Model(head_noise) sm = ps.StressModel(rain, ps.Gamma, name='recharge', settings='prec') ml.add_stressmodel(sm) ml.add_noisemodel(ps.NoiseModel()) ml.solve(tmin="1991", tmax='2015-06-29', noise=True, report=False) # Plot the results axes = ml.plots.results(figsize=(10,5)); axes[-1].plot(ps.Gamma().step([Atrue, ntrue, atrue], cutoff=0.999))
[<matplotlib.lines.Line2D at 0x7fcc7d5aded0>]
[5]:
print(np.exp(-1./ml.parameters.loc["noise_alpha", "optimal"]).round(2), "vs", alpha)
0.8 vs 0.8
The estimated parameters for the noise model are almost the same as the true parameters, showing that the model works for regular time steps.
[6]:
ml.plots.diagnostics(figsize=(10,4));
That seems okay. It is important to understand that this noisemodel will only help in removing autocorrelations at the first time lag, but not at larger time lags, compared to its AR(1) counterpart.
In this final step the ArmaModel is tested for irregular timesteps, using the indices from a real groundwater level time series.
[7]:
index = pd.read_csv("../examples/data/test_index.csv", parse_dates=True, index_col=0).index.round("D").drop_duplicates() head_irregular = head_noise.reindex(index)
[8]:
ml = ps.Model(head_irregular) sm = ps.StressModel(rain, ps.Gamma, name='recharge', settings='prec', cutoff=0.99) ml.add_stressmodel(sm) ml.add_noisemodel(ps.NoiseModel()) # ml.set_parameter("noise_alpha", 9.5, vary=False) # ml.set_parameter("recharge_A", 800, vary=False) # ml.set_parameter("recharge_n", 1.2, vary=False) # ml.set_parameter("recharge_a", 200, vary=False) ml.solve(tmin="1991", tmax='2015-06-29', noise=True, report=False) axes = ml.plots.results(figsize=(10,5)); axes[-1].plot(ps.Gamma().step([Atrue, ntrue, atrue])) print(np.exp(-1./ml.parameters.loc["noise_alpha", "optimal"]).round(2), "vs", alpha)
0.78 vs 0.8
[9]:
ml.plots.diagnostics(figsize=(10,4), acf_options={"bin_width": 0.5});
The above autocorrelation plot shows that there is some autocorrelation in the first time steps. However, this is probably more a numerical artifict due to estimating the ACF for irregular time steps than “real” autocorrelation. This can be shown when estimating the ACF for the period with more frequent observations.
[10]:
v = ml.noise() print("", v.loc[:"2010-01-01"].std()) print("", v.loc["2010-01-01":].std())
0.08935783732567387 0.06053749108377452