| Title: | Query 'Azure Data Lake Storage Gen2' with 'DuckDB' |
| Version: | 0.1.0 |
| Description: | Provides convenience utilities for using 'DuckDB' directly over datasets stored in 'Azure Data Lake Storage Gen2' (ADLS Gen2, 'abfss://'). Opens connections configured for Azure-backed 'Delta Lake' and 'Parquet' data, registers Azure credentials as 'DuckDB' secrets, and supports optional repository mirrors for restricted networks. Integrates well with 'DBI' for SQL workflows and with 'dplyr' and 'dbplyr' for lazy table queries. |
| URL: | https://github.com/pedrobtz/quak |
| BugReports: | https://github.com/pedrobtz/quak/issues |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Imports: | cli, curl, DBI, duckdb, fs, glue, rlang, tools, utils |
| Suggests: | azr (≥ 0.3.4), dbplyr, dplyr, testthat (≥ 3.0.0), tibble, withr |
| Config/testthat/edition: | 3 |
| Collate: | 'options.R' 'conditions.R' 'connection.R' 'cache.R' 'repositories.R' 'extensions.R' 'azure.R' 'datasets.R' 'tables.R' 'lake.R' 'delta.R' 'quak-package.R' 'zzz.R' |
| NeedsCompilation: | no |
| Packaged: | 2026-06-04 20:51:19 UTC; pbtz |
| Author: | Pedro Baltazar [aut, cre, cph] |
| Maintainer: | Pedro Baltazar <pedrobtz@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-09 15:50:02 UTC |
quak: Query 'Azure Data Lake Storage Gen2' with 'DuckDB'
Description
Provides convenience utilities for using 'DuckDB' directly over datasets stored in 'Azure Data Lake Storage Gen2' (ADLS Gen2, 'abfss://'). Opens connections configured for Azure-backed 'Delta Lake' and 'Parquet' data, registers Azure credentials as 'DuckDB' secrets, and supports optional repository mirrors for restricted networks. Integrates well with 'DBI' for SQL workflows and with 'dplyr' and 'dbplyr' for lazy table queries.
Author(s)
Maintainer: Pedro Baltazar pedrobtz@gmail.com [copyright holder]
See Also
Useful links:
Open a DuckDB connection configured for Azure Data Lake Storage Gen2
Description
Opens a DuckDB connection and installs the azure and delta extensions.
No secret is registered — use az_set_token_secret(), az_set_sp_secret(),
or az_set_chain_secret() to supply credentials afterwards.
Usage
az_conn(conn = NULL)
Arguments
conn |
An existing DuckDB connection to configure. When |
Value
A DuckDB connection. The caller owns its lifetime; disconnect with
DBI::dbDisconnect(conn, shutdown = TRUE).
Examples
## Not run:
# Requires a live Azure account, credentials, and network access.
conn <- az_conn() |>
az_set_token_secret(token = my_token)
DBI::dbDisconnect(conn, shutdown = TRUE)
## End(Not run)
Get Azure settings from a DuckDB connection
Description
Queries duckdb_settings() and returns all entries whose name contains
"azure".
Usage
az_conn_settings(conn = az_conn())
Arguments
conn |
A DuckDB connection. Defaults to |
Value
A tibble::tibble() with columns name, value, description.
Examples
conn <- DBI::dbConnect(duckdb::duckdb())
az_conn_settings(conn)
DBI::dbDisconnect(conn, shutdown = TRUE)
Copy data to Azure Data Lake Storage Gen2
Description
Writes a lazy table, data frame, or SQL query to an abfs:// or
abfss:// URL using DuckDB's COPY ... TO command.
Usage
az_copy_to(
conn,
x,
url,
format = c("parquet", "csv", "json"),
partition_by = NULL,
overwrite = FALSE
)
Arguments
conn |
A DuckDB connection. |
x |
A lazy |
url |
Character scalar. Azure Blob URL to write to. |
format |
Output format. One of |
partition_by |
Optional character vector of columns to partition by. |
overwrite |
Logical. When |
Value
Invisibly returns url.
Examples
## Not run:
# Requires a live Azure account, credentials, and network access.
conn <- az_conn()
az_copy_to(
conn,
"SELECT * FROM events WHERE event_date >= DATE '2026-01-01'",
"abfss://container@account/exports/events",
format = "parquet"
)
## End(Not run)
Get the default Azure OAuth scope
Description
Returns the Azure OAuth scope used in examples and token-based authentication
helpers. Configure it with options(quak.default_scope = "...") or the
QUAK_DEFAULT_SCOPE environment variable.
Usage
az_default_scope()
Value
A character scalar OAuth scope.
Examples
az_default_scope()
List files in a Delta table on Azure Data Lake Storage Gen2
Description
Returns DuckDB's delta_list_files() output for a Delta table.
Usage
az_delta_files(conn, url)
Arguments
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL pointing to a Delta table. |
Value
A tibble-like data frame with the active file manifest.
Examples
## Not run:
# Requires a live Azure account, credentials, and network access.
conn <- az_conn()
az_delta_files(conn, "abfss://container@account/tables/sales")
## End(Not run)
Check whether data exists at an Azure path
Description
For an exact file or glob pattern, checks whether DuckDB's glob() returns
at least one match. For a plain path, also probes url/** so dataset
prefixes count as existing when they contain at least one object.
Usage
az_exists(conn, url)
Arguments
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL or glob pattern. |
Value
Logical scalar.
Examples
## Not run:
# Requires a live Azure account, credentials, and network access.
conn <- az_conn()
az_exists(conn, "abfss://container@account/data/sales")
## End(Not run)
Preview an Azure dataset
Description
Prints a small preview and invisibly returns it as a tibble-like data frame.
Usage
az_glimpse(conn, url, n = 10, format = NULL)
Arguments
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL. |
n |
Number of rows to preview. Default |
format |
Optional format override. One of |
Value
Invisibly returns the preview tibble-like data frame.
Examples
## Not run:
# Requires a live Azure account, credentials, and network access.
conn <- az_conn()
az_glimpse(conn, "abfss://container@account/data/*.parquet", n = 5)
## End(Not run)
List Azure paths matching a glob pattern
Description
Uses DuckDB's glob() table function over Azure storage.
Usage
az_glob(conn, pattern)
Arguments
conn |
A DuckDB connection. |
pattern |
Character scalar. |
Value
Character vector of matching paths.
Examples
## Not run:
# Requires a live Azure account, credentials, and network access.
conn <- az_conn()
az_glob(conn, "abfss://container@account/data/*.parquet")
## End(Not run)
List Azure secrets registered in DuckDB
Description
Queries duckdb_secrets() and returns secrets whose type is "azure".
Values are returned as DuckDB reports them; DuckDB handles redaction of
sensitive fields.
Usage
az_list_secrets(conn = conn_default())
Arguments
conn |
A DuckDB connection. Defaults to |
Value
A tibble::tibble() with the columns returned by
duckdb_secrets().
Examples
conn <- DBI::dbConnect(duckdb::duckdb())
az_list_secrets(conn)
DBI::dbDisconnect(conn, shutdown = TRUE)
Inspect a dataset schema without collecting data
Description
Uses DuckDB's DESCRIBE SELECT over a remote scan and returns only column
names and DuckDB types.
Usage
az_schema(conn, url, format = NULL)
Arguments
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL. |
format |
Optional format override. One of |
Value
A tibble-like data frame with columns name and type.
Examples
## Not run:
# Requires a live Azure account, credentials, and network access.
conn <- az_conn()
az_schema(conn, "abfss://container@account/data/*.parquet")
## End(Not run)
Register an Azure credential-chain secret
Description
Creates or replaces a DuckDB Azure secret using the credential_chain
provider. This lets DuckDB resolve credentials itself, for example from the
Azure CLI or environment.
Usage
az_set_chain_secret(conn, account = NULL, chain = "default")
Arguments
conn |
A DuckDB connection. |
account |
Optional storage account name. When supplied, the secret is scoped to that account. |
chain |
Optional character vector of DuckDB credential-chain entries.
Values are joined with semicolons and passed as DuckDB's |
Value
Invisibly returns conn.
Examples
## Not run:
# Requires a live Azure account, credentials, and network access.
conn <- az_conn()
az_set_chain_secret(conn, chain = "cli")
## End(Not run)
Register an Azure service-principal secret
Description
Creates or replaces a DuckDB Azure secret using the service_principal
provider.
Usage
az_set_sp_secret(conn, tenant_id, client_id, client_secret, account = NULL)
Arguments
conn |
A DuckDB connection. |
tenant_id |
Character scalar. Azure Entra tenant ID. |
client_id |
Character scalar. Service principal client ID. |
client_secret |
Character scalar. Service principal client secret. |
account |
Optional storage account name. When supplied, the secret is scoped to that account. |
Value
Invisibly returns conn.
Examples
## Not run:
# Requires a live Azure account, credentials, and network access.
conn <- az_conn()
az_set_sp_secret(
conn,
tenant_id = "00000000-0000-0000-0000-000000000000",
client_id = Sys.getenv("AZURE_CLIENT_ID"),
client_secret = Sys.getenv("AZURE_CLIENT_SECRET")
)
## End(Not run)
Register an Azure token secret
Description
Creates or replaces a DuckDB Azure secret using the access_token provider.
Use this when another package has already obtained an access token and you
want to register or refresh a token secret.
Usage
az_set_token_secret(conn, token, account = NULL)
Arguments
conn |
A DuckDB connection. |
token |
Character scalar. Access token value. |
account |
Optional storage account name. When supplied, the secret is
scoped to |
Value
Invisibly returns conn.
Examples
## Not run:
# Requires a live Azure account, credentials, and network access.
conn <- az_conn()
az_set_token_secret(conn, token = "<access-token>")
## End(Not run)
Tune Azure read settings on a DuckDB connection
Description
Sets the Azure performance and transport settings exposed by DuckDB. Each
argument defaults to NULL, which leaves that setting unchanged.
Usage
az_tune(
conn,
concurrency = NULL,
chunk_size = NULL,
buffer_size = NULL,
transport = NULL,
metadata_cache = NULL,
context_cache = NULL
)
Arguments
conn |
A DuckDB connection. |
concurrency |
Optional positive whole number for
|
chunk_size |
Optional positive whole number or character scalar for
|
buffer_size |
Optional positive whole number or character scalar for
|
transport |
Optional character scalar for
|
metadata_cache |
Optional logical scalar for
|
context_cache |
Optional logical scalar for |
Value
Invisibly returns conn.
Examples
## Not run:
# Requires a live Azure account, credentials, and network access.
conn <- az_conn()
az_tune(conn, concurrency = 8, metadata_cache = TRUE)
## End(Not run)
Write Parquet data to Azure Data Lake Storage Gen2
Description
Thin convenience wrapper around az_copy_to() with
format = "parquet".
Usage
az_write_parquet(conn, x, url, partition_by = NULL, overwrite = FALSE)
Arguments
conn |
A DuckDB connection. |
x |
A lazy |
url |
Character scalar. Azure Blob URL to write to. |
partition_by |
Optional character vector of columns to partition by. |
overwrite |
Logical. When |
Value
Invisibly returns url.
Examples
## Not run:
# Requires a live Azure account, credentials, and network access.
conn <- az_conn()
az_write_parquet(conn, data.frame(x = 1:3), "abfss://container@account/x")
## End(Not run)
Validate that a URL is an Azure Data Lake URL
Description
Validate that a URL is an Azure Data Lake URL
Usage
check_azure_url(url)
Arguments
url |
Character scalar. URL to validate. |
Value
Invisibly returns NULL; called for its side effect of aborting
when url is not an abfss:// URL.
Pre-flight checks before collecting a tbl_az
Description
Aborts when the backing DuckDB connection is closed or the azure extension
is not loaded, turning otherwise cryptic collect-time failures into
actionable messages.
Usage
check_tbl_az(x, call = rlang::caller_env())
Arguments
x |
A |
call |
The calling environment, used for error reporting. |
Value
Invisibly returns x.
Collect an Azure-backed lazy tbl
Description
dplyr::collect() method for tables created by tbl_delta() and
tbl_parquet(). Verifies that the backing DuckDB connection is still open
and that the azure extension is loaded before the query is materialised,
then defers to the underlying dbplyr method.
Usage
## S3 method for class 'tbl_az'
collect(x, ...)
Arguments
x |
A |
... |
Passed on to the next |
Value
A tibble::tibble() with the collected rows.
Examples
## Not run:
# Requires a live Azure account, credentials, and network access.
conn <- az_conn()
tbl_delta(conn, "abfss://container@account/path/sales") |>
dplyr::collect()
## End(Not run)
Default DuckDB connection
Description
Returns the DuckDB default connection (thin wrapper around
duckdb::default_conn()).
Usage
conn_default()
Value
A DuckDB connection.
Create a DuckDB driver
Description
Thin wrapper around duckdb::duckdb() with quak-friendly defaults.
Usage
conn_driver(
dbdir = ":memory:",
read_only = FALSE,
bigint = "numeric",
config = list(),
...,
unsigned = FALSE,
environment_scan = FALSE
)
Arguments
dbdir |
Character scalar. Database path. Defaults to |
read_only |
Logical. Open in read-only mode. Defaults to |
bigint |
Character scalar. How to represent 64-bit integers.
Defaults to |
config |
Named list of DuckDB configuration options. Defaults to
|
... |
Additional arguments passed to |
unsigned |
Logical. Allow loading unsigned (locally-built or
community) extensions — equivalent to |
environment_scan |
Logical. Scan the R environment for secrets.
Defaults to |
Value
A duckdb_driver object.
DuckDB instance info
Description
Queries the active connection for the library version and platform string.
The two values together identify the subdirectory path used by the extension
repositories, e.g. v1.2.0/osx_arm64/.
Usage
conn_info(conn = conn_default())
Arguments
conn |
A DuckDB connection. Defaults to |
Value
A named list with elements version (e.g. "v1.2.0") and
platform (e.g. "osx_arm64").
Create a DuckDB connection
Description
Thin wrapper around DBI::dbConnect() with quak-friendly defaults.
Usage
conn_open(..., drv = conn_driver(), timezone_out = "", array = "matrix")
Arguments
... |
Additional arguments passed to |
drv |
A DuckDB driver. Defaults to |
timezone_out |
Character scalar. Timezone for |
array |
Character scalar. How to represent DuckDB arrays.
Defaults to |
Value
A duckdb_connection object.
Get or set DuckDB settings
Description
When called with no arguments, returns all settings as a data frame. When name
is supplied and value is NULL, returns the value of that setting. When
both name and value are supplied, executes SET <name> = <value>.
Usage
conn_setting(conn = conn_default(), name = NULL, value = NULL)
Arguments
conn |
A DuckDB connection. |
name |
Optional character scalar. Setting name. |
value |
Optional value to set. Coerced to character; DuckDB casts it to the appropriate type. |
Value
All settings: a tibble::tibble(). Single setting read: a
character scalar. Write: conn invisibly.
Examples
conn <- DBI::dbConnect(duckdb::duckdb())
conn_setting(conn, "threads")
DBI::dbDisconnect(conn, shutdown = TRUE)
Ensure Azure-related extensions are loaded
Description
Loads the azure extension and, optionally, the delta extension on
conn. Does not auto-install.
Usage
ensure_azure_exts(conn, delta = FALSE)
Arguments
conn |
A DuckDB connection. |
delta |
Logical. Also load the |
Value
Invisibly returns NULL; called for its side effect of loading the
azure (and optionally delta) extension onto conn.
Extension cache
Description
Builds an ext_cache object: a list of closures bound to a cache directory,
implementing CRUD over cached .duckdb_extension files. Files are laid out
under <cache_path>/<version>/<platform>/<name>.duckdb_extension.
Usage
ext_cache(cache_path = ext_cache_path())
Arguments
cache_path |
Character scalar. Cache root directory. Defaults to
|
Value
An ext_cache object (a list of closures) with elements:
-
.path: the cache root. -
get(name, version, platform): path to the cached extension, orNULL. -
add(name, version, platform, src): copiessrcinto the cache. -
list(): data frame of cached extensions. -
del(name, version, platform): removes a cached extension. Whenversionandplatformare omitted, removes all cached entries forname.
Examples
cache <- ext_cache(file.path(tempdir(), "quak-cache"))
cache$.path
Default DuckDB extension cache directory
Description
Resolution order: in-memory value (opts$set("cache_dir", ...)) ->
env var QUAK_CACHE_DIR -> OS-appropriate user cache directory via
tools::R_user_dir().
Usage
ext_cache_path()
Value
Character scalar. The resolved cache path.
Examples
ext_cache_path()
Find the DuckDB extension folder
Description
Returns the path where DuckDB stores installed extension files.
This is determined by the extension_directory setting.
Usage
ext_dir(conn = conn_default())
Arguments
conn |
A DuckDB connection. Defaults to |
Value
Character scalar. Path to the extension directory.
Examples
conn <- DBI::dbConnect(duckdb::duckdb())
ext_dir(conn)
DBI::dbDisconnect(conn, shutdown = TRUE)
Install a DuckDB extension
Description
Tries two strategies in order, succeeding as soon as one works:
Usage
ext_install(
name,
cache = ext_cache(),
repo = c("core", "community"),
conn = conn_default(),
verbose = NULL
)
Arguments
name |
Character scalar. Extension name. |
cache |
An |
repo |
|
conn |
A DuckDB connection. Defaults to |
verbose |
Logical or |
Details
-
SQL install: runs DuckDB's built-in
INSTALL(using the configured repository URL when one is set viarepo_set_urls(), theQUAK_CORE_REPO/QUAK_COMMUNITY_REPOenv vars, or thequak.core_repo/quak.community_repoR options). -
Manual fallback: when the SQL install fails (e.g. DuckDB cannot reach an HTTPS URL before
httpfsis loaded, whereas R'scurlcan), downloads the.duckdb_extensionfile, caches it, and copies it into the extension directory.
A SQL failure is never raised on its own — it only surfaces (as a warning,
when verbose = TRUE) if the manual fallback also runs. An error is raised
only when both strategies fail.
Idempotent — skips install if the extension is already installed
(checked via the duckdb_extensions() pragma).
Value
Invisibly returns conn.
Examples
## Not run:
# Requires network access to download the extension.
conn <- DBI::dbConnect(duckdb::duckdb())
ext_install("httpfs", conn = conn)
DBI::dbDisconnect(conn, shutdown = TRUE)
## End(Not run)
Install a DuckDB extension from a local file
Description
Executes INSTALL '/path/to/ext.duckdb_extension' on conn. Use this to
install an extension binary you already have on disk without going through a
remote repository.
Usage
ext_install_local(path, name = NULL, conn = conn_default())
Arguments
path |
Character scalar. Path to the |
name |
Character scalar. Extension name used in messages. Inferred
from |
conn |
A DuckDB connection. Defaults to |
Value
Invisibly returns conn.
Examples
## Not run:
# Requires a local DuckDB extension file at the given path.
conn <- DBI::dbConnect(duckdb::duckdb())
ext_install_local("/path/to/httpfs.duckdb_extension", conn = conn)
DBI::dbDisconnect(conn, shutdown = TRUE)
## End(Not run)
Install an extension manually using the cache
Description
Looks the extension up in cache. On a hit the cached
.duckdb_extension file is copied into the connection's
extension_directory. On a miss ext_download() is invoked first to
populate the cache, and the freshly-cached file is copied.
Usage
ext_install_manual(
name,
cache = ext_cache(),
repo = "core",
conn = conn_default()
)
Arguments
name |
Character scalar. Extension name. |
cache |
An |
repo |
|
conn |
A DuckDB connection. |
Value
Invisibly returns conn.
Install an extension via DuckDB's SQL INSTALL command
Description
Install an extension via DuckDB's SQL INSTALL command
Usage
ext_install_sql(
name,
repo = c("core", "community"),
repo_url = NULL,
conn = conn_default()
)
Arguments
name |
Character scalar. Extension name. |
repo |
|
repo_url |
Character scalar or |
conn |
A DuckDB connection. |
Value
Invisibly returns conn.
Check whether a DuckDB extension is installed
Description
Check whether a DuckDB extension is installed
Usage
ext_is_installed(name, conn = conn_default())
Arguments
name |
Character scalar. Extension name. |
conn |
A DuckDB connection. Defaults to |
Value
Logical scalar.
Examples
conn <- DBI::dbConnect(duckdb::duckdb())
ext_is_installed("httpfs", conn = conn)
DBI::dbDisconnect(conn, shutdown = TRUE)
List all DuckDB core extensions
Description
Returns the full catalog of extensions maintained by the DuckDB core team, regardless of whether they are installed.
Usage
ext_list_available(conn = conn_default())
Arguments
conn |
A DuckDB connection. Defaults to |
Value
A tibble::tibble() with columns: name, version, description.
Examples
conn <- DBI::dbConnect(duckdb::duckdb())
ext_list_available(conn)
DBI::dbDisconnect(conn, shutdown = TRUE)
List installed DuckDB extensions
Description
Queries duckdb_extensions(), returning only extensions where installed = TRUE.
Usage
ext_list_installed(conn = conn_default())
Arguments
conn |
A DuckDB connection. Defaults to |
Value
A tibble::tibble() with columns: name, installed, loaded, version, description.
Examples
conn <- DBI::dbConnect(duckdb::duckdb())
ext_list_installed(conn)
DBI::dbDisconnect(conn, shutdown = TRUE)
Load a DuckDB extension, installing it first if necessary
Description
When path is supplied, executes LOAD '/path/to/ext.duckdb_extension'
directly — no install check or auto-install occurs. When only name is
supplied, returns immediately if the extension is already loaded. Otherwise
it checks whether the extension is installed; if not and
auto_install = TRUE, installs it (prompting first when ask = TRUE and
the session is interactive), then executes LOAD <name>.
Usage
ext_load(
name = NULL,
path = NULL,
conn = conn_default(),
auto_install = TRUE,
ask = rlang::is_interactive(),
cache = ext_cache(),
repo = c("core", "community")
)
Arguments
name |
Character scalar. Extension name. When |
path |
Optional character scalar. Path to a local
|
conn |
A DuckDB connection. Defaults to |
auto_install |
Logical. Install automatically when the extension is
missing. Default |
ask |
Logical. Prompt the user before installing. Defaults to
|
cache |
An |
repo |
|
Value
Invisibly returns conn.
Examples
## Not run:
# Requires network access to download and load the extension.
conn <- DBI::dbConnect(duckdb::duckdb())
ext_load("httpfs", conn = conn)
DBI::dbDisconnect(conn, shutdown = TRUE)
## End(Not run)
Set the DuckDB extension folder
Description
Changes the path where DuckDB stores installed extension files for conn.
The value is written to DuckDB's extension_directory setting.
Usage
ext_set_dir(path, conn = conn_default(), create = TRUE)
Arguments
path |
Character scalar. Path to the extension directory. |
conn |
A DuckDB connection. Defaults to |
create |
Logical. If |
Value
Invisibly returns the normalized extension directory path.
Examples
conn <- DBI::dbConnect(duckdb::duckdb())
ext_set_dir(file.path(tempdir(), "quak-exts"), conn = conn)
DBI::dbDisconnect(conn, shutdown = TRUE)
Uninstall a DuckDB extension
Description
Removes the extension file from DuckDB's extension_directory. Optionally
also purges the corresponding entry from the local cache.
Usage
ext_uninstall(
name,
purge_cache = FALSE,
cache = ext_cache(),
conn = conn_default()
)
Arguments
name |
Character scalar. Extension name. |
purge_cache |
Logical. If |
cache |
An |
conn |
A DuckDB connection. Defaults to |
Value
Invisibly returns conn.
Examples
## Not run:
# Requires a connection with the extension already installed.
conn <- DBI::dbConnect(duckdb::duckdb())
ext_uninstall("httpfs", conn = conn)
DBI::dbDisconnect(conn, shutdown = TRUE)
## End(Not run)
Build an extension download URL
Description
Combines a repository base URL with the running DuckDB version and platform
to produce the full path to an extension archive. When repo_url is
supplied it overrides the configured repo URL.
Usage
ext_url(
ext,
repo = c("core", "community"),
repo_url = NULL,
conn = conn_default()
)
Arguments
ext |
Character scalar. Extension name. |
repo |
|
repo_url |
Optional character scalar. Explicit base URL overriding
|
conn |
A DuckDB connection. Defaults to |
Value
A character scalar URL.
Decompress a gzip file to a destination path
Description
Streams src (a gzip-compressed file) through gzfile() into dest,
fully closing both connections before returning. Closing the output
connection flushes R's internal buffer to disk; skipping that step can
leave the trailing bytes — where a .duckdb_extension stores its metadata
footer — unwritten, yielding a corrupt file.
Usage
gunzip_file(src, dest)
Arguments
src |
Character scalar. Path to the gzip-compressed source file. |
dest |
Character scalar. Path to write the decompressed output to. |
Value
Invisibly returns dest.
Register a CSV dataset as a view on a DuckDB connection
Description
Validates the URL, loads the azure extension, then registers the dataset
as a VIEW over read_csv_auto(). Use az_conn() first if the connection
needs an Azure secret. Returns conn invisibly — use tbl_csv() if you
want a dplyr::tbl().
Usage
load_csv(conn, url, name, replace = TRUE, ...)
Arguments
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL. Supports glob patterns. |
name |
Character scalar. Name to register the view under in DuckDB. |
replace |
Logical. Replace an existing view. Default |
... |
Reader options forwarded to DuckDB's |
Value
Invisibly returns conn.
Examples
## Not run:
# Requires a live Azure account, credentials, and network access.
conn <- az_conn()
load_csv(conn, "abfss://container@account/data/*.csv", name = "events")
## End(Not run)
Register a Delta, Parquet, CSV, or JSON dataset on a DuckDB connection
Description
Dispatches to load_delta(), load_parquet(), load_csv(), or
load_json() based on format. Only arguments accepted by the target
function may be passed via ...; passing format-incompatible arguments
raises an error.
Usage
load_dataset(
conn,
url,
name,
format = c("delta", "parquet", "csv", "json"),
...
)
Arguments
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL. |
name |
Character scalar. Name to register the dataset under in DuckDB. |
format |
One of |
... |
Passed to the selected loader. |
Value
Invisibly returns conn.
Examples
## Not run:
# Requires a live Azure account, credentials, and network access.
conn <- az_conn()
load_dataset(
conn,
"abfss://container@account/path/sales",
name = "sales",
format = "delta"
)
## End(Not run)
Register a Delta Lake table on a DuckDB connection
Description
Validates the URL, loads the azure and delta extensions, then registers
the table either as an ATTACH database or a VIEW. Use az_conn() first if
the connection needs an Azure secret. Returns conn invisibly — use
tbl_delta() if you want a dplyr::tbl().
Usage
load_delta(
conn,
url,
name,
method = c("attach", "view"),
replace = TRUE,
version = NULL,
timestamp = NULL
)
Arguments
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL pointing to a Delta table. |
name |
Character scalar. Name to register the table under in DuckDB. |
method |
|
replace |
Logical. Replace an existing registration. Default |
version |
Optional non-negative Delta table version to attach. |
timestamp |
Optional Delta table timestamp to attach. Only one of
|
Value
Invisibly returns conn.
Examples
## Not run:
# Requires a live Azure account, credentials, and network access.
conn <- az_conn()
load_delta(conn, "abfss://container@account/path/sales", name = "sales")
DBI::dbGetQuery(conn, "SELECT COUNT(*) FROM sales")
## End(Not run)
Register a JSON dataset as a view on a DuckDB connection
Description
Validates the URL, loads the azure extension, then registers the dataset
as a VIEW over read_json_auto(). Use az_conn() first if the connection
needs an Azure secret. Returns conn invisibly — use tbl_json() if you
want a dplyr::tbl().
Usage
load_json(conn, url, name, replace = TRUE, ...)
Arguments
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL. Supports glob patterns. |
name |
Character scalar. Name to register the view under in DuckDB. |
replace |
Logical. Replace an existing view. Default |
... |
Reader options forwarded to DuckDB's |
Value
Invisibly returns conn.
Examples
## Not run:
# Requires a live Azure account, credentials, and network access.
conn <- az_conn()
load_json(conn, "abfss://container@account/data/*.json", name = "events")
## End(Not run)
Register a Parquet dataset as a view on a DuckDB connection
Description
Validates the URL, loads the azure extension, then registers the dataset
as a VIEW. Use az_conn() first if the connection needs an Azure secret.
Returns conn invisibly — use tbl_parquet() if you want a dplyr::tbl().
Usage
load_parquet(conn, url, name, hive_partitioning = FALSE, replace = TRUE)
Arguments
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL. Supports glob patterns. |
name |
Character scalar. Name to register the view under in DuckDB. |
hive_partitioning |
Logical. Enable Hive partition inference. Default |
replace |
Logical. Replace an existing view. Default |
Value
Invisibly returns conn.
Examples
## Not run:
# Requires a live Azure account, credentials, and network access.
conn <- az_conn()
load_parquet(conn, "abfss://container@account/data/*.parquet", name = "events")
## End(Not run)
Tag a lazy tbl as Azure-backed
Description
Prepends the tbl_az S3 class to a lazy dplyr::tbl() so that
collect.tbl_az() can run pre-flight checks before the query is
materialised.
Usage
new_tbl_az(x)
Arguments
x |
A lazy |
Value
x with "tbl_az" prepended to its class vector.
Print the quak option registry
Description
Renders one row per option with its current (resolved) value, the source that value came from, the environment variable that can override it (and whether it is set), and the built-in default.
Usage
## S3 method for class 'quak_opts'
print(x, mask = TRUE, ...)
Arguments
x |
A |
mask |
Logical. When |
... |
Unused. |
Value
Invisibly returns x.
List all quak options and their current values
Description
Prints every quak option (via print.quak_opts()) and invisibly returns a
tibble of the same information. The resolution order is: value set via
options(quak.*) -> the option's env var -> a built-in default.
Usage
quak_options(mask = TRUE)
Arguments
mask |
Logical. When |
Value
Invisibly, a tibble::tibble() with columns option, value,
source, env_var, env_value, and default.
Examples
quak_options()
Check which extension names are available in a repository
Description
Sends a HEAD request for each name in ext against the given repository
and reports which ones the server serves (2xx). Use this to discover which
extensions are actually published for the running DuckDB version and
platform.
Usage
repo_check(
repo = c("core", "community"),
ext = NULL,
conn = conn_default(),
verbose = FALSE
)
Arguments
repo |
|
ext |
Character vector of extension names to probe. Required. |
conn |
A DuckDB connection. Defaults to |
Value
Invisibly returns a named logical vector, one element per name in
ext.
Assemble an extension download URL
Description
Pure URL builder — no I/O, no connection required. Concatenates the
repository base URL, DuckDB version, platform, and extension filename with
/ (correct for URLs on all platforms).
Usage
repo_ext_url(repo_url, version, platform, name)
Arguments
repo_url |
Character scalar. Repository base URL. |
version |
Character scalar. DuckDB version string (e.g. |
platform |
Character scalar. Platform string (e.g. |
name |
Character scalar. Extension name (e.g. |
Value
Character scalar. Full URL to the .duckdb_extension.gz file.
Set DuckDB extension repository URLs
Description
Stores URLs in R options quak.core_repo / quak.community_repo so they
can be configured org-wide in .Rprofile. When core is supplied, also
sets DuckDB's custom_extension_repository on conn; passing NULL
resets that connection setting to DuckDB's default.
Usage
repo_set_urls(
core = NULL,
community = NULL,
check = TRUE,
conn = conn_default()
)
Arguments
core |
Optional character scalar. URL for the core extension repository.
Omit to leave the current value unchanged. Pass |
community |
Optional character scalar. URL for the community extension
repository. Omit to leave the current value unchanged. Pass |
check |
Logical. If |
conn |
A DuckDB connection. Defaults to |
Value
Invisibly returns a named list with elements core and community
reflecting the current option values.
Examples
old <- repo_urls()
repo_set_urls(core = "https://extensions.example.com", check = FALSE)
repo_urls()
repo_set_urls(core = old$core, check = FALSE)
Get DuckDB extension repository URLs
Description
Returns the currently active repository URLs. Resolution order per repo:
R option (quak.core_repo / quak.community_repo) -> env var
(QUAK_CORE_REPO / QUAK_COMMUNITY_REPO) -> built-in default.
Usage
repo_urls()
Value
A named list with elements core and community.
Examples
repo_urls()
Open a CSV dataset as a lazy dplyr tbl
Description
Validates the URL, loads the azure extension, then returns a lazy
dplyr::tbl() over the dataset. Use az_conn() first if the connection
needs Azure extensions, settings, or secrets.
Usage
tbl_csv(conn, url, name = NULL, replace = TRUE, ...)
Arguments
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL. Supports glob patterns. |
name |
Optional character scalar. Name to register the view under in
DuckDB. When |
replace |
Logical. Replace an existing view of the same name.
Default |
... |
Reader options forwarded to DuckDB's |
Details
When name is NULL the dataset is queried directly via read_csv_auto()
with no persistent object registered on the connection. When name is
supplied the dataset is first registered as a VIEW via load_csv(), then
referenced by name.
Value
A dplyr::tbl() backed by the CSV dataset.
Examples
## Not run:
# Requires a live Azure account, credentials, and network access.
conn <- az_conn()
tbl_csv(conn, "abfss://container@account/data/*.csv") |>
dplyr::collect()
## End(Not run)
Open a Delta Lake table as a lazy dplyr tbl
Description
Validates the URL, loads the azure and delta extensions, then returns a
lazy dplyr::tbl() over the table. Use az_conn() first if the connection
needs Azure extensions, settings, or secrets.
Usage
tbl_delta(
conn,
url,
name = NULL,
method = c("attach", "view"),
replace = TRUE,
version = NULL,
timestamp = NULL
)
Arguments
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL pointing to a Delta table
(e.g. |
name |
Optional character scalar. Name to register the table under in
DuckDB. When |
method |
|
replace |
Logical. Replace an existing registration of the same name.
Default |
version |
Optional non-negative Delta table version to read. |
timestamp |
Optional Delta table timestamp to read. Only one of
|
Details
When name is NULL the table is queried directly via delta_scan() with
no persistent object registered on the connection. When name is supplied
the table is first registered via load_delta() (as an ATTACH database or a
VIEW depending on method), then referenced by name.
Delta time travel currently requires name because DuckDB exposes
version and timestamp through ATTACH, not delta_scan().
Value
A dplyr::tbl() backed by the Delta table.
Examples
## Not run:
# Requires a live Azure account, credentials, and network access.
conn <- az_conn()
tbl_delta(conn, "abfss://container@account/path/sales") |>
dplyr::filter(amount > 100) |>
dplyr::collect()
## End(Not run)
Open a JSON dataset as a lazy dplyr tbl
Description
Validates the URL, loads the azure extension, then returns a lazy
dplyr::tbl() over the dataset. Use az_conn() first if the connection
needs Azure extensions, settings, or secrets.
Usage
tbl_json(conn, url, name = NULL, replace = TRUE, ...)
Arguments
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL. Supports glob patterns. |
name |
Optional character scalar. Name to register the view under in
DuckDB. When |
replace |
Logical. Replace an existing view of the same name.
Default |
... |
Reader options forwarded to DuckDB's |
Details
When name is NULL the dataset is queried directly via read_json_auto()
with no persistent object registered on the connection. When name is
supplied the dataset is first registered as a VIEW via load_json(), then
referenced by name.
Value
A dplyr::tbl() backed by the JSON dataset.
Examples
## Not run:
# Requires a live Azure account, credentials, and network access.
conn <- az_conn()
tbl_json(conn, "abfss://container@account/data/*.json") |>
dplyr::collect()
## End(Not run)
Open a Parquet dataset as a lazy dplyr tbl
Description
Validates the URL, loads the azure extension, then returns a lazy
dplyr::tbl() over the dataset. Use az_conn() first if the connection
needs Azure extensions, settings, or secrets.
Usage
tbl_parquet(conn, url, name = NULL, hive_partitioning = FALSE, replace = TRUE)
Arguments
conn |
A DuckDB connection. |
url |
Character scalar. Azure Blob URL. Supports glob patterns for
multi-file datasets
(e.g. |
name |
Optional character scalar. Name to register the view under in
DuckDB. When |
hive_partitioning |
Logical. Enable Hive partition inference from the
directory structure. Default |
replace |
Logical. Replace an existing view of the same name.
Default |
Details
When name is NULL the dataset is queried directly via read_parquet()
with no persistent object registered on the connection. When name is
supplied the dataset is first registered as a VIEW via load_parquet(), then
referenced by name. Glob patterns (e.g. "*.parquet") are supported in
url for multi-file datasets.
Value
A dplyr::tbl() backed by the Parquet dataset.
Examples
## Not run:
# Requires a live Azure account, credentials, and network access.
conn <- az_conn()
tbl_parquet(conn, "abfss://container@account/data/*.parquet") |>
dplyr::collect()
## End(Not run)