{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Time series in Pastas\n", "\n", "Time series are at the heart of Pastas and modeling hydraulic head fluctuations. In this section background information is provided on important characteristics of time series and how these may influence your modeling results." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import pastas as ps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Different types of time series\n", "\n", "### Regular and irregular time series\n", "\n", "Time series data are a set of data values measured at certain times, ordered in a way that the time indices are increasing. Many time series analysis and modeling methods require that the time step between the measurements is always the same or in other words, equidistant. Such regular time series may have missing data, but those may be filled so that computations can be done with a constant time step. Hydraulic heads are often measured at irregular time intervals; such time series are called irregular. This is especially true for historic time series that were measured by hand. The figure below graphically shows the difference between the three types of time series." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "regular = pd.Series(\n", " index=pd.date_range(\"2000-01-01\", \"2000-01-10\", freq=\"D\"), data=np.ones(10)\n", ")\n", "missing_data = regular.copy()\n", "missing_data.loc[[\"2000-01-03\", \"2000-01-08\"]] = np.nan\n", "\n", "index = [t + pd.Timedelta(np.random.rand() * 24, unit=\"H\") for t in missing_data.index]\n", "irregular = missing_data.copy()\n", "irregular.index = index\n", "\n", "fig, axes = plt.subplots(3, 1, figsize=(6, 3), sharex=True, sharey=True)\n", "\n", "regular.plot(ax=axes[0], linestyle=\" \", marker=\"o\", x_compat=True)\n", "missing_data.plot(ax=axes[1], linestyle=\" \", marker=\"o\", x_compat=True)\n", "irregular.plot(ax=axes[2], linestyle=\" \", marker=\"o\", x_compat=True)\n", "\n", "for i, name in enumerate(\n", " [\"(a) Regular time steps\", \"(b) Missing Data\", \"(c) Irregular time steps\"]\n", "):\n", " axes[i].grid()\n", " axes[i].set_title(name)\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Independent and dependent time series \n", "\n", "We can differentiate between two types of input time series for Pastas models: the dependent and independent time series. The dependent time series are those that we want to explain/model (e.g., the groundwater levels), and are referred to as the `oseries` in pastas (short for observation series). The independent time series are those that we use to model the dependent time series (e.g., precipitation or evaporation), and are referred to as `stresses` in Pastas. The requirements for these time series are different:\n", "\n", "- The dependent time series may be of any kind: regular, missing data or irregular.\n", "- The stress time series must have regular time steps.\n", "\n", "### A word on timestamps and measurements\n", "\n", "Stresses often represent a flux measured over a certain time period. For example, precipitation is often provided in mm/day, and the value represents the cumulative precipitation amount for a day. This is recorded at the end of the day, e.g., a measurement with a time stamp of 2000-01-01 represents the total precipitation that fell on the first of January in the year 2000.\n", "\n", "## The Python package for time series data: `pandas`\n", "\n", "The `pandas` package provides a lot of methods to deal with time series data, such as resampling, gap-filling, and computing descriptive statistics. Another important functionality of `pandas` are the `pandas.read_csv` and related methods, which facilitate the loading of data from csv-files and other popular data storage formats. Pastas requires all time series to be provided as `pandas.Series` with a `pandas.DatetimeIndex`; examples are provided. For more information and user guidance on `pandas` please see their documentation website (https://pandas.pydata.org).\n", "\n", "## Validating user-provided time series\n", "\n", "As is clear from the descriptions above, the user is required to provide time series in a certain format and with certain characteristics, depending on the type of time series. To prevent issues later in the modeling chain, all user-provided time series are internally checked to make sure all requirements are met. This is done using the `pastas.validate_stress` and `pastas.validate_oseries` methods, which can also be called directly by the user. Let's look at the docstring of these methods to see what is checked:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "?ps.validate_stress" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The last check (equidistant time steps) is not required for `oseries`. If any of these checks fail, the `pastas.validate_stress` and `pastas.validate_oseries` methods will return an Error with pointers on how to solve the problem and fix the time series. We refer to the Examples-section for more examples on how to pre-process the user-provided time series." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Settings for user-provided time series\n", "\n", "User-provided time series must be long enough and of the right frequency. If this is not the case, Pastas will adjust the provided time series. Internally, time series are stored in a `pastas.TimeSeries` object, which has functions to extend the time series forward and backward in time (when needed) and/or resample the time series to a different frequency (when needed). How these two operations are performed depends on the `settings` that are provided. An appropriate setting must be specified when creating a `StressModel` object. \n", "For example, specify the setting as `prec` for a `StressModel` that simulates the effect of precipitation. The predefined settings and their associated operations can be accessed through `ps.rcParams[\"timeseries\"]` (see code cell below). It can be seen, for example, that when the setting is `prec` (precipitation), a missing value is replaced by the value zero (`fill_nan=0.0` in the table) and when the series has to be extended in the past, it is filled with the mean value of the provided series (`fill_before=mean` in the table). " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pd.DataFrame.from_dict(ps.rcParams[\"timeseries\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each column name is a valid option for the `settings` argument. For example, the default setting for the precipitation stress provided to the `ps.RechargeModel` object is \"prec\" (see the docstring of `ps.RechargeModel`). This means that the following settings is used:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ps.rcParams[\"timeseries\"][\"prec\"]\n", "\n", "# sm = ps.Stressmodel(stress, settings=\"prec\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively, one may provide a dictionary to a stress model object with the settings. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "settings = {\n", " \"fill_before\": 0.0,\n", " \"fill_after\": 0.0,\n", " # Etcetera\n", "}\n", "# sm = ps.Stressmodel(stress, settings=settings)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If Pastas does an operation on the time series that is provided by the user, it will **always** output an INFO message describing what is done. To see these, make sure the log_level of Pastas is set to \"INFO\" by running `ps.set_log_level(\"INFO\")` before." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Formal requirements time series\n", "\n", "The formal requirements for a time series are listed below. When using, for example, `pandas.read_csv` correctly, these requirements are automatically met.\n", "\n", "* The dtype for a date must be `pandas.Timestamp`.\n", "* The dtype for a sequence of dates must be `pandas.DatetimeIndex` with `pandas.Timestamp`s.\n", "* The dtype for a time series must be a `pandas.Series` with a `pandas.DatetimeIndex`.\n", "* The dtype for the values of a `pandas.Series` must be `float`. " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.5" } }, "nbformat": 4, "nbformat_minor": 4 }