R.A. Collenteur, University of Graz, May 2020
In this notebook an Autoregressive-Moving-Average (ARMA(1,1)) noise model is developed for Pastas models. This new noise model is tested against synthetic data generated with Numpy or Statsmodels’ ARMA model. This noise model is tested on head time series with a regular time step.
Warning
It should be noted that the time step may be non-equidistant in this formulation, but this model is not yet tested for irregular time steps.
[1]:
import numpy as np import pandas as pd import matplotlib.pyplot as plt from scipy.special import gammainc, gammaincinv import pastas as ps ps.set_log_level("ERROR") ps.show_versions(numba=True)
Python version: 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 02:37:09) [Clang 10.0.1 ] Numpy version: 1.18.5 Scipy version: 1.4.0 Pandas version: 1.1.2 Pastas version: 0.16.0b Matplotlib version: 3.1.3 numba version: 0.49.1
The following formula is used to calculate the noise according to the ARMA(1,1) process:
where \(\upsilon\) is the noise, \(\Delta t_i\) the time step between the residuals (\(r\)), and respectively \(\alpha\) [days] and \(\beta\) [days] the parameters of the AR and MA parts of the model. The model is named ArmaModel and can be found in noisemodel.py. It is added to a Pastas model as follows: ml.add_noisemodel(ps.ArmaModel())
ArmaModel
noisemodel.py
ml.add_noisemodel(ps.ArmaModel())
[2]:
# Read in some data rain = ps.read.read_knmi('../examples/data/etmgeg_260.txt', variables='RH').series evap = ps.read.read_knmi('../examples/data/etmgeg_260.txt', variables='EV24').series # Set the True parameters Atrue = 800 ntrue = 1.1 atrue = 200 dtrue = 20 # Generate the head step = ps.Gamma().block([Atrue, ntrue, atrue]) h = dtrue * np.ones(len(rain) + step.size) for i in range(len(rain)): h[i:i + step.size] += rain[i] * step head = pd.DataFrame(index=rain.index, data=h[:len(rain)],) head = head['1990':'2015'] # Plot the head without noise plt.figure(figsize=(10,2)) plt.plot(head,'k.', label='head') plt.legend(loc=0) plt.ylabel('Head (m)') plt.xlabel('Time (years)');
In the following code-block, noise is generated using an ARMA(1,1) process using Numpy. An alternative procedure is available from Statsmodels (commented out now). More information about the ARMA model can be found on the statsmodels website. The noise is added to the head series generated in the previous code-block.
[3]:
# reproduction of random numbers np.random.seed(1234) alpha= 0.95 beta = 0.1 # generate samples using Statsmodels # import statsmodels.api as stats # ar = np.array([1, -alpha]) # ma = np.r_[1, beta] # arma = stats.tsa.ArmaProcess(ar, ma) # noise = arma.generate_sample(head[0].index.size)*np.std(head.values) * 0.1 # generate samples using Numpy random_seed = np.random.RandomState(1234) noise = random_seed.normal(0,1,len(head)) * np.std(head.values) * 0.1 a = np.zeros_like(head[0]) for i in range(1, noise.size): a[i] = noise[i] + noise[i - 1] * beta + a[i - 1] * alpha head_noise = head[0] + a
[4]:
plt.plot(a) plt.plot(noise)
[<matplotlib.lines.Line2D at 0x7fe70827eed0>]
[5]:
ml = ps.Model(head_noise) sm = ps.StressModel(rain, ps.Gamma, name='recharge', settings='prec') ml.add_stressmodel(sm) ml.add_noisemodel(ps.ArmaModel()) ml.solve(tmin="1991", tmax='2015-06-29', noise=True, report=True) axes = ml.plots.results(figsize=(10,5)); axes[-1].plot(ps.Gamma().step([Atrue, ntrue, atrue]))
Fit report 0 Fit Statistics ============================================ nfev 24 EVP 88.35 nobs 8946 R2 0.88 noise True RMSE 0.11 tmin 1991-01-01 00:00:00 AIC 2.71 tmax 2015-06-29 00:00:00 BIC 45.31 freq D Obj 4.03 warmup 3650 days 00:00:00 ___ solver LeastSquares Interpolated No Parameters (6 were optimized) ============================================ optimal stderr initial vary recharge_A 735.249 ±3.15% 217.314 True recharge_n 1.08908 ±0.88% 1 True recharge_a 191.753 ±4.97% 10 True constant_d 20.1752 ±0.27% 21.8737 True noise_alpha 20.5025 ±7.15% 10 True noise_beta 0.433964 ±4.62% 10 True Parameter correlations |rho| > 0.5 ============================================ recharge_A recharge_a 0.61 constant_d -0.99 recharge_n recharge_a -0.71 recharge_a constant_d -0.59
[<matplotlib.lines.Line2D at 0x7fe70cfa7e10>]
[6]:
print(np.exp(-1./ml.parameters.loc["noise_alpha", "optimal"]).round(2), "vs", alpha) print(np.exp(-1./np.abs(ml.parameters.loc["noise_beta", "optimal"])).round(2), "vs.", beta)
0.95 vs 0.95 0.1 vs. 0.1
The estimated parameters for the noise model are almost the same as the true parameters, showing that the model works for regular time steps.
[7]:
ml.plots.diagnostics(figsize=(10,4));
That seems okay. It is important to understand that this noisemodel will only help in removing autocorrelations at the first time lag, but not at larger time lags, compared to its AR(1) counterpart.
[8]:
ml = ps.Model(head_noise) sm = ps.StressModel(rain, ps.Gamma, name='recharge', settings='prec') ml.add_stressmodel(sm) ml.solve(tmin="1991", tmax='2015-06-29', noise=True, report=False) axes = ml.plots.results(figsize=(10,5)); axes[-1].plot(ps.Gamma().step([Atrue, ntrue, atrue])) print(np.exp(-1./ml.parameters.loc["noise_alpha", "optimal"]).round(2), "vs", alpha)
0.96 vs 0.95
[9]:
Significant autocorrelation is still present at lag 1 and the parameter of the AR(1) is overestimated, trying to correct for the lack of an MA(1) part. This is to be expected, as the MA(1) process generates a strong autocorrelation at this first time lag. The negative autocorrelation in the first few time steps is a result of the overestimation of the AR(1) parameter.
A possible effect of failing to remove the autocorrelation at lag 1 may be that the parameter standard errors are under- or overestimated. Although that does not seem the case for this synthetic, real life examples may suffer from this.
In this final step the ArmaModel is tested for irregular timesteps, using the indices from a real groundwater level time series. It is clear from the example below that the ArmaModel does not yet work for irregular time steps, as (unlike the AR(1) model in Pastas) no weights are applied yet.
[10]:
index = pd.read_csv("../examples/data/test_index.csv", parse_dates=True, index_col=0).index.round("D").drop_duplicates() head_irregular = head_noise.reindex(index)
[11]:
ml = ps.Model(head_irregular) sm = ps.StressModel(rain, ps.Gamma, name='recharge', settings='prec') ml.add_stressmodel(sm) ml.add_noisemodel(ps.ArmaModel()) ml.solve(tmin="1991", tmax='2015-06-29', noise=True, report=False) axes = ml.plots.results(figsize=(10,5)); axes[-1].plot(ps.Gamma().step([Atrue, ntrue, atrue])) print(np.exp(-1./ml.parameters.loc["noise_alpha", "optimal"]).round(2), "vs", alpha) print(np.exp(-1./ml.parameters.loc["noise_beta", "optimal"]).round(2), "vs.", beta)
0.95 vs 0.95 0.08 vs. 0.1
[12]: