{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Autocorrelation for irregular time series\n", "*R.A. Collenteur, Artesia Water & University of Graz, 2020*\n", "\n", "In this notebook the autocorrelation function for irregular time steps that is built-in in Pastas is tested and validated on synthetic data. The methodology for calculating the autocorrelation is based on [Rehfeld et al. (2011)](#References) and [Edelson and Krolik (1978)](#References). The methods are available through `pastas.stats` package (e.g. `pastas.stats.acf(series)`). The full report of the methods underlying this Notebook are available in [Collenteur (2018, in Dutch)](#References). \n", "\n", "1. [Create synthetic time series](#Create-synthetic-time-series)\n", "2. [Compute the autocorrelation](#Computing-the-autocorrelation)\n", "3. [Autocorrelation for regular time series](#Autocorrelation-for-regular-time-series)\n", "4. [Autocorrelation for irregular time series](#Sine-wave-with-non-equidistant-timesteps)\n", "5. [References](#References)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import pastas as ps\n", "\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create synthetic time series\n", "Two synthetic time series are created with a known autocorrelation and a regular time interval. The first is a sine wave with a wave period of one year. The second is a series of correlated noise generated through an AR(1) process. Both synthetic time series have a length of ten years and a daily observation time step ($n=3650$). From both time series three time series with irregular time steps are created:\n", "\n", "- **Time Series 1:** 3 years of monthly data, 3 years of bi-weekly data, and 4 years of daily data.\n", "- **Time Series 2:** 3 years of monthly data, 3 years of bi-weekly data, a one year gap, and 4 years of daily data.\n", "- **Time Series 3:** reindex time series using the indices from a real groundwater level time series." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "index_test = pd.read_csv(\"../examples/data/test_index.csv\", parse_dates=True, index_col=0, names=[\"Date\"]).squeeze().index.ceil(\"D\").unique()\n", "n_years=10\n", "index = pd.to_datetime(np.arange(0, n_years*365, 1), unit=\"D\", origin=\"2005\")\n", "index.name = \"Time [Years]\"\n", "\n", "# 1. Sine timeseries 1: equal timesteps\n", "np.random.seed(0)\n", "data = np.sin(np.linspace(0, n_years*2*np.pi, len(index)))#+np.random.rand(index.size)\n", "v = pd.Series(data=data, index=index, name=\"STS_SIN\")\n", "\n", "# 2. Sine timeseries with three frequencies\n", "v1 = v.iloc[:-7*365].asfreq(\"30D\").append(v.iloc[-7*365:-4*365].asfreq(\"14D\")).append(v.iloc[-4*365:])\n", "v1.name = \"STS_SIN1\"\n", "\n", "# 3. Sine timeseries with three frequencies and a data gap\n", "v2 = v.iloc[:-7*365].asfreq(\"30D\").append(v.iloc[-7*365:-4*365].asfreq(\"14D\")).append(v.iloc[-3*365:])\n", "v2.name = \"STS_SIN2\"\n", "\n", "# 4. Sine timeseries with indices based on a true groundwater level measurement indices\n", "v3 = v.reindex(index_test).dropna()\n", "v3.name = \"STS_SIN3\"\n", "\n", "# Convoluting a random noise process with a exponential decay function to obtain a autocorrelation \n", "# timeseries similar to an Auto Regressive model of order 1 (AR(1))\n", "alpha = 10\n", "np.random.seed(0)\n", "n = np.random.rand(index.size+365)\n", "b = np.exp(-np.arange(366) / alpha)\n", "n = np.convolve(n, b, mode=\"valid\")\n", "n = n-n.mean()\n", "\n", "index = pd.to_datetime(np.arange(0, n.size, 1), unit=\"D\", origin=\"2005\")\n", "index.name = \"Time [Years]\"\n", "n = pd.Series(data=n, index=index, name=\"STS_EXP\")\n", "\n", "n1 = n.reindex(v1.index).dropna()\n", "n1.name = \"STS_EXP1\"\n", "n2 = n.reindex(v2.index).dropna()\n", "n2.name = \"STS_EXP2\"\n", "n3 = n.reindex(index_test).dropna()\n", "n3.name = \"STS_EXP3\"\n", "\n", "# Create a DataFrame with all series and plot them all\n", "d = pd.concat([v,v1,v2,v3,n,n1,n2,n3], axis=1)\n", "d.plot(subplots=True, figsize=(10,6), marker=\".\",markersize=2, color=\"steelblue\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Theoretical autocorrelation\n", "In this codeblock a figure is created showing the time series with equidistant timesteps and their theoretical autocorrelation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Compute true autocorrelation functions\n", "acf_v_true = pd.Series(np.cos(np.linspace(0, 2*np.pi, 365)))\n", "acf_n_true = pd.Series(b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Computing the autocorrelation\n", "\n", "The autocorrelation function for all time series is calculated for a number of time lags. Two different methods are used:\n", "\n", "- binning in a rectangular bin\n", "- weighting through a Gaussian kernel\n", "\n", "