Testing an AR(1) Noise Model for Pastas#

R.A. Collenteur, University of Graz, May 2020

In this notebook the classical Autoregressive AR(1) noise model is tested for Pastas models. This noise model is tested against synthetic data generated with NumPy or Statsmodels’ ARMA model. This noise model is tested on head time series with regular and irregular time steps.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import pastas as ps

ps.set_log_level("ERROR")
ps.show_versions()
/tmp/ipykernel_766/27909940.py:2: DeprecationWarning: 
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd
Python version: 3.11.6
NumPy version: 1.26.4
Pandas version: 2.2.0
SciPy version: 1.12.0
Matplotlib version: 3.8.3
Numba version: 0.59.0
LMfit version: 1.2.2
Latexify version: Not Installed
Pastas version: 1.4.0

1. Develop the AR(1) Noise Model for Pastas#

The following formula is used to calculate the noise according to the ARMA(1,1) process:

$$\upsilon(t_i) = r(t_i) - r(t_{i-1}) \text{e}^{-\Delta t_i / \alpha}$$

where $\upsilon$ is the noise, $\Delta t_i$ the time step between the residuals ($r$), and $\alpha$ [days] the AR parameter of the model. The model is named NoiseModel and can be found in noisemodel.py. This is the default noise model that is added to a Pastas model, but it can also be added manually as follows: ml.add_noisemodel(ps.NoiseModel())

2. Generate synthetic head time series#

# Read in some data
rain = (
    pd.read_csv("../examples/data/rain_260.csv", index_col=0, parse_dates=[0]).squeeze()
    / 1000
)

# Set the True parameters
Atrue = 800
ntrue = 1.4
atrue = 200
dtrue = 20

# Generate the head
step = ps.Gamma().block([Atrue, ntrue, atrue], cutoff=0.9999)
h = dtrue * np.ones(len(rain) + step.size)
for i in range(len(rain)):
    h[i : i + step.size] += rain[i] * step
head = pd.DataFrame(
    index=rain.index,
    data=h[: len(rain)],
)
head = head["1990":"2015"]

# Plot the head without noise
plt.figure(figsize=(10, 2))
plt.plot(head, "k.", label="head")
plt.legend(loc=0)
plt.ylabel("Head (m)")
plt.xlabel("Time (years)");
/tmp/ipykernel_766/3724625853.py:17: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  h[i : i + step.size] += rain[i] * step
../_images/e65f7f816f3bf85fb59aebc878af48dde5bf74e17556898914d19e94309fb49b.png

3. Generate AR(1) noise and add it to the synthetic heads#

In the following code-block, noise is generated using an AR(1) process using Numpy.

# reproduction of random numbers
np.random.seed(1234)
alpha = 0.8

# generate samples using NumPy
random_seed = np.random.RandomState(1234)
noise = random_seed.normal(0, 1, len(head)) * np.std(head.values) * 0.2
a = np.zeros_like(head[0])

for i in range(1, noise.size):
    a[i] = noise[i] + a[i - 1] * alpha

head_noise = head[0] + a

4. Create and solve a Pastas Model#

ml = ps.Model(head_noise)
sm = ps.StressModel(rain, ps.Gamma(), name="recharge", settings="prec")
ml.add_stressmodel(sm)
ml.add_noisemodel(ps.NoiseModel())

ml.solve(tmin="1991", tmax="2015-06-29", noise=True, report=False)

# Plot the results
axes = ml.plots.results(figsize=(10, 5))
axes[-2].plot(ps.Gamma().step([Atrue, ntrue, atrue], cutoff=0.999));
/home/docs/checkouts/readthedocs.org/user_builds/pastas/envs/v1.4.0/lib/python3.11/site-packages/pastas/timeseries_utils.py:90: FutureWarning: Day.delta is deprecated and will be removed in a future version. Use pd.Timedelta(obj) instead
  if hasattr(offset, "delta"):
/home/docs/checkouts/readthedocs.org/user_builds/pastas/envs/v1.4.0/lib/python3.11/site-packages/pastas/timeseries_utils.py:90: FutureWarning: Day.delta is deprecated and will be removed in a future version. Use pd.Timedelta(obj) instead
  if hasattr(offset, "delta"):
../_images/b2d2dce6b9cca17ce23270e5c6b577ee59203c2ce4ea8124ddc3eec7db8a665f.png

5. Did we find back the original AR parameter?#

print(np.exp(-1.0 / ml.parameters.loc["noise_alpha", "optimal"]).round(2), "vs", alpha)
0.8 vs 0.8

The estimated parameters for the noise model are almost the same as the true parameters, showing that the model works for regular time steps.

6. So is the autocorrelation removed correctly?#

ml.plots.diagnostics(figsize=(10, 4));
../_images/43da06104815d068d8ec641ebb61d36c40c8497286ec244b3c81c6b194b7b773.png

That seems okay. It is important to understand that this noisemodel will only help in removing autocorrelations at the first time lag, but not at larger time lags.

7. Test the NoiseModel for irregular time steps#

In this final step the Ar(1) noisemodel is tested for irregular timesteps, using the indices from a real groundwater level time series.

index = (
    pd.read_csv("../examples/data/test_index.csv", parse_dates=True, index_col=0)
    .index.round("D")
    .drop_duplicates()
)
head_irregular = head_noise.reindex(index)
ml = ps.Model(head_irregular)
sm = ps.StressModel(rain, ps.Gamma(cutoff=0.99), name="recharge", settings="prec")
ml.add_stressmodel(sm)
ml.add_noisemodel(ps.NoiseModel())

# ml.set_parameter("noise_alpha", 9.5, vary=False)
# ml.set_parameter("recharge_A", 800, vary=False)
# ml.set_parameter("recharge_n", 1.2, vary=False)
# ml.set_parameter("recharge_a", 200, vary=False)

ml.solve(tmin="1991", tmax="2015-06-29", noise=True, report=False)
axes = ml.plots.results(figsize=(10, 5))
axes[-2].plot(ps.Gamma().step([Atrue, ntrue, atrue]))

print(np.exp(-1.0 / ml.parameters.loc["noise_alpha", "optimal"]).round(2), "vs", alpha)
/home/docs/checkouts/readthedocs.org/user_builds/pastas/envs/v1.4.0/lib/python3.11/site-packages/pastas/timeseries_utils.py:90: FutureWarning: Day.delta is deprecated and will be removed in a future version. Use pd.Timedelta(obj) instead
  if hasattr(offset, "delta"):
0.78 vs 0.8
../_images/3b24b85dff6c5769f0e620be995a288e56b77a0b96a0882c7fd8f5ceeac167db.png
ml.plots.diagnostics(figsize=(10, 4), acf_options={"bin_width": 0.5});
../_images/f9d1681ac08c413d15973efb823ea8974fca1ea2822f0bef581f9d864b7f0f01.png

This autocorrelation plot looks good too.

v = ml.noise()
print("", v.loc[:"2010-01-01"].std())
print("", v.loc["2010-01-01":].std())
 0.0893576750755199
 0.0605375503295119