8. Pastas TimeSeries#
Developed by Raoul Collenteur
In this Jupyter Notebook, the concept of the pastas.TimeSeries class is explained in full detail.
Objective of the Pastas TimeSeries class:
“To create one class that deals with all user-provided time series and the manipulations of the series while maintaining the original series.”
Desired Capabilities: The central idea behind the TimeSeries class is to solve all data manipulations in a single class while maintaining the original time series. While manipulating the TimeSeries when working with your Pastas model, the original data are to be maintained such that only the settings and the original series can be stored. - Validate user-provided time series - Extend before and after - Fill nan-values - Change frequency - Upsample - Downsample - Normalize values
Resources The definition of the class can be found on Github (pastas/pastas) Documentation on the Pandas Series can be found here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html
[1]:
# Import some packages
import pastas as ps
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
ps.show_versions()
Python version: 3.10.8
NumPy version: 1.23.5
Pandas version: 1.5.3
SciPy version: 1.10.0
Matplotlib version: 3.6.3
Numba version: 0.56.4
LMfit version: 1.1.0
Pastas version: 0.23.1
8.1. 1. Importing groundwater time series#
Let’s first import some time series so we have some data to play around with. We use Pandas read_csv method and obtain a Pandas Series object, pandas data structure to efficiently deal with 1D Time Series data. By default, Pandas adds a wealth of functionalities to a Series object, such as descriptive statistics (e.g. series.describe()) and plotting funtionality.
[2]:
gwdata = pd.read_csv(
"../examples/data/head_nb1.csv", parse_dates=["date"], index_col="date"
).squeeze()
gwdata.plot(figsize=(15, 4));
8.2. 2. Creating a Pastas TimeSeries object#
The user will provide time series data when creating a model instance, or one of the stressmodels found in stressmodels.py. Pastas expects Pandas Series as a standard format in which time series are provided, but will internally transform these to Pastas TimeSeries objects to add the necessary funtionality. It is therefore also possible to provide a TimeSeries object directly instead of a Pandas Series object.
We will now create a TimeSeries object for the groundwater level (gwdata). When creating a TimeSeries object the time series that are provided are validated, such that Pastas can use the provided time series for simulation without errors. The time series are checked for:
Being actual Pandas Series object;
Making sure the indices are all TimeStamps;
Making sure the indices are ordered in time;
Dropping any nan-values before and after the first and final valid value;
Frequency of the Series is inferred, or otherwise the user-provided value for “freq” is applied;
Nan-values within the series are handled, depending on the value for the “fill_nan” argument;
Duplicate indices are dropped from the series.
If all of the above is OK, a TimeSeries object is returned. When valid time series are provided all of the above checks are no problem and no settings are required. However, all too often this is not the case and at least “fill_nan” and “freq” are required. The first argument tells the TimeSeries object how to handle nan-values, and the freq argument provides the frequency of the original time series (by default, freq=D, fill_nan=“interpolate”).
[3]:
oseries = ps.TimeSeries(gwdata, name="Groundwater Level")
# Plot the new time series and the original
plt.figure(figsize=(10, 4))
oseries.plot(label="pastas timeseries")
gwdata.plot(label="original")
plt.legend()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[3], line 1
----> 1 oseries = ps.TimeSeries(gwdata, name="Groundwater Level")
3 # Plot the new time series and the original
4 plt.figure(figsize=(10, 4))
AttributeError: module 'pastas' has no attribute 'TimeSeries'
8.3. 3. Configuring a TimeSeries object#
So let’s see how we can configure a TimeSeries object. In the case of the observed groundwater levels (oseries) as in the example above, interpolating between observations might not be the preffered method to deal with gaps in your data. In fact, the do not have to be constant for simulation, one of the benefits of the method of impulse response functions. The nan-values can simply be dropped. To configure a TimeSeries object the user has three options:
Use a predefined configuration by providing a string to the settings argument
Manually set all or some of the settings by providing a dictonary to the settings argument
Providing the arguments as keyword arguments to the TimeSeries object (not recommended)
For example, when creating a TimeSeries object for the groundwater levels consider the three following examples for setting the fill_nan option:
[4]:
# Options 1
oseries = ps.TimeSeries(gwdata, name="Groundwater Level", settings="oseries")
print(oseries.settings)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[4], line 2
1 # Options 1
----> 2 oseries = ps.TimeSeries(gwdata, name="Groundwater Level", settings="oseries")
3 print(oseries.settings)
AttributeError: module 'pastas' has no attribute 'TimeSeries'
[5]:
# Option 2
oseries = ps.TimeSeries(
gwdata, name="Groundwater Level", settings=dict(fill_nan="drop")
)
print(oseries.settings)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[5], line 2
1 # Option 2
----> 2 oseries = ps.TimeSeries(
3 gwdata, name="Groundwater Level", settings=dict(fill_nan="drop")
4 )
5 print(oseries.settings)
AttributeError: module 'pastas' has no attribute 'TimeSeries'
[6]:
# Options 3
oseries = ps.TimeSeries(gwdata, name="Groundwater Level", fill_nan="drop")
print(oseries.settings)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[6], line 2
1 # Options 3
----> 2 oseries = ps.TimeSeries(gwdata, name="Groundwater Level", fill_nan="drop")
3 print(oseries.settings)
AttributeError: module 'pastas' has no attribute 'TimeSeries'
8.3.1. Predefined settings#
All of the above methods yield the same result. It is up to the user which one is preferred.
A question that may arise with options 1, is what the possible strings for settings are and what configuration is then used. The TimeSeries class contains a dictionary with predefined settings that are used often. You can ask the TimeSeries class this question:
[7]:
pd.DataFrame(ps.TimeSeries._predefined_settings).T
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[7], line 1
----> 1 pd.DataFrame(ps.TimeSeries._predefined_settings).T
AttributeError: module 'pastas' has no attribute 'TimeSeries'
8.4. 4. Let’s explore the possibilities#
As said, Pastas TimeSeries are capable of handling time series in a way that is convenient for Pastas.
Changing the frequency of the time series (sample_up, sameple_down)
Extending the time series (fill_before and fill_after)
Normalizing the time series (norm *not fully supported yet)
We will now import some precipitation series measured at a daily frequency and show how the above methods work
[8]:
# Import observed precipitation series
precip = pd.read_csv(
"../examples/data/rain_nb1.csv", parse_dates=["date"], index_col="date"
).squeeze()
prec = ps.TimeSeries(precip, name="Precipitation", settings="prec")
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[8], line 5
1 # Import observed precipitation series
2 precip = pd.read_csv(
3 "../examples/data/rain_nb1.csv", parse_dates=["date"], index_col="date"
4 ).squeeze()
----> 5 prec = ps.TimeSeries(precip, name="Precipitation", settings="prec")
AttributeError: module 'pastas' has no attribute 'TimeSeries'
[9]:
# fig, ax = plt.subplots(2, 1, figsize=(10,8))
# prec.update_series(freq="D")
# prec.series.plot.bar(ax=ax[0])
# prec.update_series(freq="7D")
# prec.series.plot.bar(ax=ax[1])
# import matplotlib.dates as mdates
# ax[1].fmt_xdata = mdates.DateFormatter('%m')
# fig.autofmt_xdate()
8.4.1. Wait, what?#
We just changed the frequency of the TimeSeries. When reducing the frequency, the values were summed into the new bins. Conveniently, all pandas methods are still available and functional, such as the great plotting functionalities of Pandas.
All this happened inplace, meaning the same object just took another shape based on the new settings. Moreover, it performed those new settings (freq="W" weekly values) on the original series. This means that going back and forth between frequencies does not lead to any information loss.
Why is this so important? Because when solving or simulating a model, the Model will ask every member of the TimeSeries family to prepare itself with the necessary settings (e.g. new freq) and perform that operation only once. When asked for a time series, the TimeSeries object will “be” in that new shape.
8.4.2. Some more action#
Let’s say, we want to simulate the groundwater series for a period where no data is available for the time series, but we need some kind of value for the warmup period to prevent things from getting messy. The TimeSeries object can easily extend itself, as the following example shows.
[10]:
prec.update_series(tmin="2011")
prec.plot()
prec.settings
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[10], line 1
----> 1 prec.update_series(tmin="2011")
2 prec.plot()
3 prec.settings
NameError: name 'prec' is not defined
8.5. 5. Exporting the TimeSeries#
When done, we might want to store the TimeSeries object for later use. A to_dict method is built-in to export the original time series to a json format, along with its current settings and name. This way the original data is maintained and can easily be recreated from a json file.
[11]:
data = prec.to_dict()
print(data.keys())
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[11], line 1
----> 1 data = prec.to_dict()
2 print(data.keys())
NameError: name 'prec' is not defined
[12]:
# Tadaa, we have our extended time series in weekly frequency back!
ts = ps.TimeSeries(**data)
ts.plot()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[12], line 2
1 # Tadaa, we have our extended time series in weekly frequency back!
----> 2 ts = ps.TimeSeries(**data)
3 ts.plot()
AttributeError: module 'pastas' has no attribute 'TimeSeries'