A Basic Model#

In this example application it is shown how a simple time series model can be developed to simulate groundwater levels. The recharge (calculated as precipitation minus evaporation) is used as the explanatory time series.

import matplotlib.pyplot as plt
import pandas as pd

import pastas as ps

ps.show_versions()

/tmp/ipykernel_896/1365457298.py:2: DeprecationWarning: 
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd

Python version: 3.11.6
NumPy version: 1.26.4
Pandas version: 2.2.0
SciPy version: 1.12.0
Matplotlib version: 3.8.3
Numba version: 0.59.0
LMfit version: 1.2.2
Latexify version: Not Installed
Pastas version: 1.4.0

1. Importing the dependent time series data#

In this codeblock a time series of groundwater levels is imported using the read_csv function of pandas. As pastas expects a pandas Series object, the data is squeezed. To check if you have the correct data type (a pandas Series object), you can use type(oseries) as shown below.

The following characteristics are important when importing and preparing the observed time series:

The observed time series are stored as a pandas Series object.
The time step can be irregular.

# Import groundwater time seriesm and squeeze to Series object
gwdata = pd.read_csv(
    "data/head_nb1.csv", parse_dates=["date"], index_col="date"
).squeeze()
print("The data type of the oseries is: %s" % type(gwdata))

# Plot the observed groundwater levels
gwdata.plot(style=".", figsize=(10, 4))
plt.ylabel("Head [m]")
plt.xlabel("Time [years]")

The data type of the oseries is: <class 'pandas.core.series.Series'>

Text(0.5, 0, 'Time [years]')

../_images/f632b93d250086e553bd5745aad6e45d0cf25f749c3bbe2ebdb06e030cb09955.png

2. Import the independent time series#

Two explanatory series are used: the precipitation and the potential evaporation. These need to be pandas Series objects, as for the observed heads.

Important characteristics of these time series are:

All series are stored as pandas Series objects.
The series may have irregular time intervals, but then it will be converted to regular time intervals when creating the time series model later on.
It is preferred to use the same length units as for the observed heads.

# Import observed precipitation series
precip = pd.read_csv(
    "data/rain_nb1.csv", parse_dates=["date"], index_col="date"
).squeeze()
print("The data type of the precip series is: %s" % type(precip))

# Import observed evaporation series
evap = pd.read_csv(
    "data/evap_nb1.csv", parse_dates=["date"], index_col="date"
).squeeze()
print("The data type of the evap series is: %s" % type(evap))

# Calculate the recharge to the groundwater
recharge = precip - evap
recharge.name = "recharge"  # set name if pandas series
print("The data type of the recharge series is: %s" % type(recharge))

# Plot the time series of the precipitation and evaporation
plt.figure()
recharge.plot(label="Recharge", figsize=(10, 4))
plt.xlabel("Time [years]")
plt.ylabel("Recharge (m/year)")

The data type of the precip series is: <class 'pandas.core.series.Series'>
The data type of the evap series is: <class 'pandas.core.series.Series'>
The data type of the recharge series is: <class 'pandas.core.series.Series'>

Text(0, 0.5, 'Recharge (m/year)')

../_images/9f945bfc4e15c5c600256089099a5d11bfadc933aee8d04c8f531d65b693e50d.png

3. Create the time series model#

In this code block the actual time series model is created. First, an instance of the Model class is created (named ml here). Second, the different components of the time series model are created and added to the model. The imported time series are automatically checked for missing values and other inconsistencies. The keyword argument fillnan can be used to determine how missing values are handled. If any nan-values are found this will be reported by pastas.

# Create a model object by passing it the observed series
ml = ps.Model(gwdata, name="GWL")

# Add the recharge data as explanatory variable
sm = ps.StressModel(recharge, ps.Gamma(), name="recharge", settings="evap")
ml.add_stressmodel(sm)

WARNING: The Time Series 'recharge' has nan-values. Pastas will use the fill_nan settings to fill up the nan-values.

/home/docs/checkouts/readthedocs.org/user_builds/pastas/envs/v1.4.0/lib/python3.11/site-packages/pastas/timeseries_utils.py:90: FutureWarning: Day.delta is deprecated and will be removed in a future version. Use pd.Timedelta(obj) instead
  if hasattr(offset, "delta"):
INFO: Time Series 'recharge': 22 nan-value(s) was/were found and filled with: interpolate.

INFO: Time Series 'recharge': 22 nan-value(s) was/were found and filled with: interpolate.

4. Solve the model#

The next step is to compute the optimal model parameters. The default solver uses a non-linear least squares method for the optimization. The python package scipy is used (info on scipy's least_squares solver can be found here). Some standard optimization statistics are reported along with the optimized parameter values and correlations.

ml.solve()

INFO: Time Series 'recharge': 22 nan-value(s) was/were found and filled with: interpolate.

INFO: Time Series 'recharge' was extended in the past to 1975-11-17 00:00:00 with the mean value (0.00048) of the time series.

Fit report GWL                         Fit Statistics
=====================================================
nfev    21                     EVP              91.28
nobs    644                    R2                0.91
noise   True                   RMSE              0.13
tmin    1985-11-14 00:00:00    AIC           -3234.20
tmax    2015-06-28 00:00:00    BIC           -3211.87
freq    D                      Obj               2.09
warmup  3650 days 00:00:00     ___                   
solver  LeastSquares           Interp.             No

Parameters (5 optimized)
=====================================================
                optimal     initial  vary      stderr
recharge_A   753.599130  215.674528  True      ±5.17%
recharge_n     1.054413    1.000000  True      ±1.50%
recharge_a   135.887588   10.000000  True      ±7.05%
constant_d    27.552229   27.900078  True  ±8.00e-02%
noise_alpha   61.765719   15.000000  True     ±12.66%

5. Plot the results#

The solution can be plotted after a solution has been obtained.

ml.plot()

<Axes: title={'center': 'Results of GWL'}, xlabel='date', ylabel='Groundwater levels [meter]'>

../_images/8b37fc7000e07deecb3e4e864b3293527efd6857de93a7659184352328f360c3.png

6. Advanced plotting#

There are many ways to further explore the time series model. pastas has some built-in functionalities that will provide the user with a quick overview of the model. The plots subpackage contains all the options. One of these is the method plots.results which provides a plot with more information.

ml.plots.results(figsize=(10, 6))

[<Axes: xlabel='date'>,
 <Axes: xlabel='date'>,
 <Axes: title={'right': "Stresses: ['recharge']"}>,
 <Axes: title={'center': 'Step response'}, xlabel='Time [days]'>,
 <Axes: title={'left': 'Model Parameters ($n_c$=5)'}>]

../_images/242c34d6609de0c533ef1d923ecd674e20c9a0bb79d7f90ff6006ef27518c057.png

7. Statistics#

The stats subpackage includes a number of statistical functions that may applied to the model. One of them is the summary method, which gives a summary of the main statistics of the model.

ml.stats.summary()

	Value
Statistic
rmse	0.126901
rmsn	0.080897
sse	10.370911
mae	0.101293
nse	0.912832
evp	91.283323
rsq	0.912832
bic	-3211.865371
aic	-3234.203865

8. Improvement: estimate evaporation factor#

In the previous model, the recharge was estimated as precipitation minus potential evaporation. A better model is to estimate the actual evaporation as a factor (called the evaporation factor here) times the potential evaporation. First, new model is created (called ml2 here so that the original model ml does not get overwritten). Second, the RechargeModel object with a Linear recharge model is created, which combines the precipitation and evaporation series and adds a parameter for the evaporation factor f. The RechargeModel object is added to the model, the model is solved, and the results and statistics are plotted to the screen. Note that the new model gives a better fit (lower root mean squared error and higher explained variance), but that the Akiake information criterion indicates that the addition of the additional parameter does not improve the model signficantly (the Akaike criterion for model ml2 is higher than for model ml).

# Create a model object by passing it the observed series
ml2 = ps.Model(gwdata)

# Add the recharge data as explanatory variable
ts1 = ps.RechargeModel(
    precip,
    evap,
    ps.Gamma(),
    name="rainevap",
    recharge=ps.rch.Linear(),
    settings=("prec", "evap"),
)
ml2.add_stressmodel(ts1)

# Solve the model
ml2.solve()

# Plot the results
ml2.plot()

# Statistics
ml2.stats.summary()

/home/docs/checkouts/readthedocs.org/user_builds/pastas/envs/v1.4.0/lib/python3.11/site-packages/pastas/timeseries_utils.py:90: FutureWarning: Day.delta is deprecated and will be removed in a future version. Use pd.Timedelta(obj) instead
  if hasattr(offset, "delta"):
INFO: Time Series 'rain' was extended in the past to 1975-11-17 00:00:00 with the mean value (0.0021) of the time series.

INFO: Time Series 'evap' was extended in the past to 1975-11-17 00:00:00 with the mean value (0.0016) of the time series.

Fit report head                     Fit Statistics
==================================================
nfev    23                     EVP           92.90
nobs    644                    R2             0.93
noise   True                   RMSE           0.11
tmin    1985-11-14 00:00:00    AIC        -3256.86
tmax    2015-06-28 00:00:00    BIC        -3230.05
freq    D                      Obj            2.01
warmup  3650 days 00:00:00     ___                
solver  LeastSquares           Interp.          No

Parameters (6 optimized)
==================================================
                optimal     initial  vary   stderr
rainevap_A   682.469524  215.674528  True   ±5.24%
rainevap_n     1.018207    1.000000  True   ±1.78%
rainevap_a   150.381156   10.000000  True   ±7.48%
rainevap_f    -1.271043   -1.000000  True   ±4.77%
constant_d    27.882277   27.900078  True   ±0.24%
noise_alpha   50.095051   15.000000  True  ±11.90%

	Value
Statistic
rmse	0.114492
rmsn	0.079510
sse	8.441829
mae	0.090257
nse	0.929046
evp	92.904642
rsq	0.929046
bic	-3230.051307
aic	-3256.857499

../_images/fc2ad64f0b15c6ffd51a958ecf7821a5bdaad8caa6a24e3ac64682b28f28734b.png

Origin of the series#

The rainfall data is taken from rainfall station Heibloem in The Netherlands.
The evaporation data is taken from weather station Maastricht in The Netherlands.
The head data is well B58C0698, which was obtained from Dino loket