{ "cells": [ { "cell_type": "markdown", "id": "50415948", "metadata": {}, "source": [ "# Improving Pastas performance with caching\n", "\n", "This notebook shows how pastas performance can be improved by caching computation results. " ] }, { "cell_type": "code", "execution_count": null, "id": "6c5afe91", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "import pastas as ps\n", "\n", "ps.set_log_level(\"WARNING\")" ] }, { "cell_type": "markdown", "id": "56c395ae", "metadata": {}, "source": [ "Load some test data for the examples." ] }, { "cell_type": "code", "execution_count": null, "id": "fb231873", "metadata": {}, "outputs": [], "source": [ "head = pd.read_csv(\"data/heby_head.csv\", index_col=0, parse_dates=True).squeeze(\n", " \"columns\"\n", ")\n", "evap = pd.read_csv(\"data/heby_evap.csv\", index_col=0, parse_dates=True).squeeze(\n", " \"columns\"\n", ")\n", "prec = pd.read_csv(\"data/heby_prec.csv\", index_col=0, parse_dates=True).squeeze(\n", " \"columns\"\n", ")\n", "temp = pd.read_csv(\"data/heby_temp.csv\", index_col=0, parse_dates=True).squeeze(\n", " \"columns\"\n", ")" ] }, { "cell_type": "markdown", "id": "4c5ffdfd", "metadata": {}, "source": [ "If the `cachetools` module is installed, Pastas can cache intermediate results for certain stressmodels. The cache essentially works as a dictionary that checks if the input arguments to a function, e.g. `StressModel.simulate()` are already stored in the cache. If so, it returns the stored solution, otherwise it computes the solution and adds it to the cache.\n", "\n", "
\n", "Note: The tradeoff with caching is that it can speed up Pastas by skipping some computations, but it uses more internal memory to store those intermediate results. \n", "
\n" ] }, { "cell_type": "markdown", "id": "b7a76d6e", "metadata": {}, "source": [ "By default, caching is turned off." ] }, { "cell_type": "code", "execution_count": null, "id": "7161214c", "metadata": {}, "outputs": [], "source": [ "ps.get_use_cache()" ] }, { "cell_type": "markdown", "id": "c1dac031", "metadata": {}, "source": [ "You can turn it on with `ps.set_use_cache(True)`:" ] }, { "cell_type": "code", "execution_count": null, "id": "2e0ade2e", "metadata": {}, "outputs": [], "source": [ "ps.set_use_cache(True)\n", "ps.get_use_cache() # show that value has changed" ] }, { "cell_type": "markdown", "id": "09757d0f", "metadata": {}, "source": [ "The cache is stored in StressModels under the `._cache` attribute. The following StressModel should indicate it contains an `LRUCache` that is currently empty.\n", "\n", "The size of the cache can be set when creating a StressModel using the `max_cache_size` parameter. The default size is 32, meaning 32 solutions can be stored. For models with many parameters or complex optimization, you may want to increase this value.\n", "\n", "\n", "
\n", "Note: if cachetools is not available the ._cache attribute will be None.\n", "
" ] }, { "cell_type": "code", "execution_count": null, "id": "66f844ad", "metadata": {}, "outputs": [], "source": [ "sm = ps.StressModel(prec, ps.Exponential(), \"test\")\n", "sm._cache" ] }, { "cell_type": "markdown", "id": "c508e8de", "metadata": {}, "source": [ "Let's use an example to show the effect of caching on the performance of `ml.solve()`. This is an example from `example_snow.py`, a time series model with a non-linear recharge model." ] }, { "cell_type": "code", "execution_count": null, "id": "96a40005", "metadata": {}, "outputs": [], "source": [ "def build_model():\n", " ml = ps.Model(head)\n", " sm = ps.RechargeModel(\n", " prec,\n", " evap,\n", " recharge=ps.rch.FlexModel(snow=True),\n", " rfunc=ps.Gamma(),\n", " name=\"rch\",\n", " temp=temp,\n", " )\n", " ml.add_stressmodel(sm)\n", " ml.set_parameter(\"rch_kv\", vary=False)\n", " return ml" ] }, { "cell_type": "markdown", "id": "80aabe04", "metadata": {}, "source": [ "Here we create a function to solve the model using a two-step method. First we solve for the optimal parameters without a noise model, then using those estimates of the parameters, we add a noise model and solve for the parameters again. " ] }, { "cell_type": "code", "execution_count": null, "id": "f07420ad", "metadata": {}, "outputs": [], "source": [ "def two_step_solve(ml):\n", " tmin = \"1985\"\n", " tmax = \"2018\"\n", " ml.solve(tmin=tmin, tmax=tmax, fit_constant=False, report=False)\n", " ml.add_noisemodel(ps.ArNoiseModel())\n", " ml.set_parameter(\"rch_ks\", vary=False)\n", " ml.solve(tmin=tmin, tmax=tmax, fit_constant=False, initial=False, report=False)" ] }, { "cell_type": "markdown", "id": "a19eb3e6", "metadata": {}, "source": [ "Now let's test how long it takes to solve the model without using the cache." ] }, { "cell_type": "code", "execution_count": null, "id": "efe7686c", "metadata": {}, "outputs": [], "source": [ "ps.set_use_cache(False) # turn off caching\n", "ml = build_model()\n", "t0 = %timeit -r 1 -n 1 -o two_step_solve(ml) # timing the solve" ] }, { "cell_type": "markdown", "id": "a685bc25", "metadata": {}, "source": [ "And now with the cache:" ] }, { "cell_type": "code", "execution_count": null, "id": "22689b0b", "metadata": {}, "outputs": [], "source": [ "ps.set_use_cache(True) # turn on caching\n", "ml = build_model()\n", "t1 = %timeit -r 1 -n 1 -o two_step_solve(ml) # timing the solve again" ] }, { "cell_type": "markdown", "id": "2d93a98a", "metadata": {}, "source": [ "Note that the cache is now in use:" ] }, { "cell_type": "code", "execution_count": null, "id": "c7797488", "metadata": {}, "outputs": [], "source": [ "ml.stressmodels[\"rch\"]._cache.currsize" ] }, { "cell_type": "markdown", "id": "31d27a17", "metadata": {}, "source": [ "And some statistics about the performance gain achieved by activating caching:" ] }, { "cell_type": "code", "execution_count": null, "id": "c727e144", "metadata": {}, "outputs": [], "source": [ "diff = t0.average - t1.average\n", "speedup = diff / (t0.average) * 100\n", "print(f\"Model solve time reduced by {diff:.1f}s ({speedup:.0f}%) by using caching.\")" ] }, { "cell_type": "markdown", "id": "eb3ed3c3", "metadata": {}, "source": [ "Finally, another nice bonus when using caching is that the optimal solution is already stored, meaning any calls to functions in which the optimal solution is needed (e.g. plotting) will be extra fast." ] }, { "cell_type": "code", "execution_count": null, "id": "74b5b3b5", "metadata": {}, "outputs": [], "source": [ "ml.plots.results();" ] }, { "cell_type": "markdown", "id": "cbd12a1a", "metadata": {}, "source": [ "## Important Notes\n", "\n", "
\n", "Cache Invalidation: The cache does not automatically detect when you modify stress data or response function settings. If you change these after creating a model, you must manually clear the cache using stressmodel._cache.clear(), or rebuild the stress model.\n", "
\n", "\n", "
\n", "Cache Performance: The cache uses exact floating-point comparison for parameter values. During optimization, minor variations in parameter values (e.g., 1.0000000001 vs 1.0) create separate cache entries.\n", "
\n", "\n", "In normal workflows where you create a model, solve it, and use the results, cache invalidation is not an issue." ] }, { "cell_type": "markdown", "id": "50ef1129", "metadata": {}, "source": [ "## Temporarily enabling or disabling caching\n", "\n", "For operations where you know caching won't help, you can temporarily disable caching using the context manager:" ] }, { "cell_type": "code", "execution_count": null, "id": "5bae9602", "metadata": {}, "outputs": [], "source": [ "with ps.temporarily_disable_cache():\n", " # Caching is disabled here\n", " print(f\"Caching is temporarily: {'enabled' if ps.get_use_cache() else 'disabled'}\")\n", " params = ml.get_parameters()\n", " result = ml.simulate(params)\n", "\n", "# Caching is automatically re-enabled after the block\n", "print(f\"Cache is now: {'enabled' if ps.get_use_cache() else 'disabled'}\")" ] }, { "cell_type": "markdown", "id": "4b6c7400", "metadata": {}, "source": [ "Or if you wish to temporarily enable caching:" ] }, { "cell_type": "code", "execution_count": null, "id": "716f310a", "metadata": {}, "outputs": [], "source": [ "ps.set_use_cache(False) # turn off caching\n", "\n", "with ps.temporarily_enable_cache():\n", " # Caching is enable here\n", " print(f\"Caching is temporarily: {'enabled' if ps.get_use_cache() else 'disabled'}\")\n", " params = ml.get_parameters()\n", " result = ml.simulate(params)\n", "\n", "# Caching is automatically re-enabled after the block\n", "print(f\"Cache is now: {'enabled' if ps.get_use_cache() else 'disabled'}\")" ] } ], "metadata": { "kernelspec": { "display_name": "pastas", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.7" } }, "nbformat": 4, "nbformat_minor": 5 }