datasusr can cache DATASUS downloads in a local
directory so that repeated calls do not hit the DATASUS FTP again. This
is especially useful when developing analysis pipelines
interactively.
When you call datasus_download() with
use_cache = TRUE (the default), files are stored in a
structured subdirectory tree under the cache folder. On subsequent calls
for the same files, the cached versions are reused without any network
access.
By default, downloads are placed in a session-scoped subdirectory of
tempdir() (which R cleans up automatically when the session
ends), so the package never writes outside the user-controlled tempdir
unless you opt in.
The cache location is resolved in the following order:
cache_dir function argumentDATASUSR_CACHE_DIR environment variabledatasusr.cache_dir R optionfile.path(tempdir(), "datasusr-cache"))To enable a persistent cache that survives across sessions, point one
of the above to a directory of your choice — for example
tools::R_user_dir("datasusr", "cache") — and the cache
becomes truly persistent.
To set it globally, add a line to your .Renviron:
DATASUSR_CACHE_DIR=/path/to/my/cache
Or in R:
Pass refresh = TRUE to datasus_download()
(or datasus_fetch()) to re-download files even when they
exist in the cache:
Over time the cache can grow large. Two functions help manage its size:
# Remove files older than 90 days
datasus_cache_prune(older_than_days = 90)
# Keep the total cache under 5 GB
datasus_cache_prune(max_size_bytes = 5 * 1024^3)
# Remove everything
datasus_cache_clear()When pruning by size, the least-recently-accessed files are removed first.