| Title: | Clinical Trial Example Datasets |
| Version: | 0.1.0 |
| Description: | A collection of clinical trial example datasets from multiple sources including the CDISC Pilot 01 study (CDISC https://www.cdisc.org/). All datasets are provided in Parquet format for efficient storage and can be accessed using the 'connector' package. Designed for training, testing, prototyping, and demonstrating clinical data analysis workflows. |
| Depends: | R (≥ 4.1.0) |
| License: | Apache License (≥ 2) |
| URL: | https://lovemore-gakava.github.io/clinTrialData/, https://github.com/Lovemore-Gakava/clinTrialData |
| BugReports: | https://github.com/Lovemore-Gakava/clinTrialData/issues |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Imports: | connector, httr, jsonlite, piggyback, tools |
| Suggests: | arrow, dplyr, ggplot2, testthat (≥ 3.0.0), knitr, rmarkdown, tidyr |
| Config/testthat/edition: | 3 |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2026-02-25 21:48:59 UTC; lovemore.gakavagmail.com |
| Author: | Lovemore Gakava [aut, cre, cph] |
| Maintainer: | Lovemore Gakava <Lovemore.Gakava@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-03-03 11:20:08 UTC |
Load a stale study listing and refresh the cached column
Description
Load a stale study listing and refresh the cached column
Usage
.load_stale_studies(reason)
Arguments
reason |
Character string describing why the fallback is needed. |
Value
A data frame, or NULL if no cache exists.
Package onLoad hook
Description
Called when the package is loaded. Registers bundled and cached study folders as locked (in memory) to prevent accidental data modification. No files are written to disk.
Usage
.onLoad(libname, pkgname)
Arguments
libname |
Library name |
pkgname |
Package name |
Set directory permissions (Unix only)
Description
On Unix-like systems, sets the directory and its files to read-only (mode 0555/0444) or read-write (mode 0755/0644). This is a no-op on Windows, where these permission bits are not meaningful. Only applied to paths under the user cache directory.
Usage
.set_permissions(path, read_only = TRUE)
Arguments
path |
Directory path. |
read_only |
Logical; TRUE to make read-only, FALSE to restore. |
Path to the cached study-listing file
Description
Returns the path where list_available_studies() stores its last
successful result for offline fallback.
Usage
.studies_cache_path()
Get the Local Cache Directory
Description
Returns the path to the local cache directory where downloaded clinical
trial datasets are stored. The location follows the platform-specific
user data directory convention via tools::R_user_dir().
You can delete any subdirectory here to remove a cached dataset, or clear the entire directory to free disk space.
Usage
cache_dir()
Value
A character string with the path to the cache directory.
Examples
cache_dir()
Check if a study folder can be written to
Description
Returns TRUE if the folder is not locked; FALSE with a warning otherwise.
Usage
can_write_study(study_path, operation = "write to study folder")
Arguments
study_path |
Path to the study folder |
operation |
Description of the operation being attempted |
Value
Logical indicating if the operation can proceed
Clinical Trial Datasets
Description
The clinTrialData package contains clinical trial datasets from multiple sources, stored in Parquet format. Data is accessed using connector functions.
Available Data Sources
CDISC Pilot 01 Study
The CDISC Pilot 01 study data includes both ADaM and SDTM domains.
ADaM datasets include:
ADSL: Subject-Level Analysis Dataset
ADAE: Adverse Events Analysis Dataset
ADLB: Laboratory Analysis Dataset (combined)
ADLBC: Laboratory Analysis Dataset (Chemistry)
ADLBH: Laboratory Analysis Dataset (Hematology)
ADLBHY: Laboratory Analysis Dataset (Hy's Law)
ADQSADAS: ADAS-Cog Questionnaire Analysis Dataset
ADQSCIBC: CIBC Questionnaire Analysis Dataset
ADQSNPIX: NPI-X Questionnaire Analysis Dataset
ADTTE: Time-to-Event Analysis Dataset
ADVS: Vital Signs Analysis Dataset
SDTM datasets include:
DM: Demographics
AE: Adverse Events
VS: Vital Signs
LB: Laboratory Test Results
And 18 additional domains (see
list_data_sources()for details)
Usage
Data sources are discovered by scanning the package directory structure.
List available datasets with list_data_sources().
Access data using the connection function:
# Connect to any data source (e.g., CDISC Pilot data)
db <- connect_clinical_data("cdisc_pilot")
# List available datasets
db$adam$list_content_cnt()
# Read a dataset
adsl <- db$adam$read_cnt("adsl")
# See all available data sources
list_data_sources()
Data Format
Datasets are stored in Parquet format:
Columnar storage
Fast reads
Compression
Cross-platform compatibility
Source
CDISC Pilot 01 Study Data Various clinical trial data sources
References
CDISC. Clinical Data Interchange Standards Consortium. https://www.cdisc.org/
Connect to Clinical Data by Source
Description
Generic connection function that allows access to any data source in the package. Data sources are automatically discovered by scanning the package's example data directory structure.
Usage
connect_clinical_data(source = "cdisc_pilot")
Arguments
source |
Character string specifying the data source.
Use |
Value
A connectors object
Examples
# Connect to CDISC Pilot data
db <- connect_clinical_data("cdisc_pilot")
# List available datasets
db$adam$list_content_cnt()
# Read a dataset
adsl <- db$adam$read_cnt("adsl")
# List available sources
list_data_sources()
Connect to Data Source
Description
Generic function to connect to any data source by scanning its directory structure and generating the connector configuration dynamically. Wraps all filesystem connectors with lock protection.
Resolution order:
User cache (downloaded via
download_study())Package-bundled data (
inst/exampledata/)
Usage
connect_to_source(source_name)
Arguments
source_name |
Name of the data source (e.g., "cdisc_pilot") |
Value
A connectors object
Inspect a Clinical Trial Dataset Without Downloading
Description
Fetches and displays metadata for any study available in the
clinTrialData library – without downloading the full dataset. Metadata
includes the study description, available domains and datasets, subject
count, version, and data source attribution.
For studies already downloaded via download_study(), the metadata is read
from the local cache and works offline. For studies not yet downloaded, a
small JSON file (~2KB) is fetched from the GitHub Release.
Usage
dataset_info(source, repo = "Lovemore-Gakava/clinTrialData")
Arguments
source |
Character string. Name of the study (e.g.
|
repo |
GitHub repository in the form |
Value
Invisibly returns the metadata as a named list.
Examples
dataset_info("cdisc_pilot")
Download a Clinical Trial Study Dataset
Description
Downloads a study dataset from a GitHub Release and stores it in the local
cache (see cache_dir()). Once downloaded, the study is available to
connect_clinical_data() without an internet connection.
Requires the piggyback package.
Usage
download_study(
source,
version = "latest",
force = FALSE,
repo = "Lovemore-Gakava/clinTrialData"
)
Arguments
source |
Character string. The name of the study to download (e.g.
|
version |
Character string. The release tag to download from. Defaults
to |
force |
Logical. If |
repo |
GitHub repository in the form |
Value
Invisibly returns the path to the cached study directory.
Examples
# Download the CDISC Pilot study
download_study("cdisc_pilot")
# Then connect as usual
db <- connect_clinical_data("cdisc_pilot")
Generate Connector Configuration from Directory Structure
Description
Scans a data source directory and generates a connector configuration list dynamically based on the available parquet files.
Usage
generate_connector_config(source_path)
Arguments
source_path |
Path to the data source directory |
Value
A list suitable for passing to connector::connect()
Get lock status for a study folder
Description
Returns information about the lock status of a study folder.
Usage
get_lock_status(study_path)
Arguments
study_path |
Path to the study folder |
Value
A list with components locked (logical) and path (character).
Check if a package is available
Description
Thin wrapper around requireNamespace() to allow mocking in tests.
Usage
has_package(pkg)
Arguments
pkg |
Package name. |
Value
Logical.
Check if a study folder is locked
Description
Checks whether a study path is locked in the current session, indicating that the data should not be overwritten.
Usage
is_study_locked(study_path)
Arguments
study_path |
Path to the study folder |
Value
Logical indicating if the folder is locked
List Studies Available for Download
Description
Returns a data frame of all clinical trial studies available as GitHub
Release assets, along with their local cache status. Studies marked as
cached = TRUE are already downloaded and available for use with
connect_clinical_data() without an internet connection.
When GitHub is unreachable, the function falls back to the last
successfully fetched listing (if available) and issues a warning.
The cached column is always recomputed from the local filesystem.
Requires the piggyback package.
Usage
list_available_studies(repo = "Lovemore-Gakava/clinTrialData")
Arguments
repo |
GitHub repository in the form |
Value
A data frame with columns:
- source
Study name (pass this to
download_study()orconnect_clinical_data())- version
Release tag the asset belongs to
- size_mb
Asset size in megabytes
- cached
TRUEif the study is already in the local cache
Examples
list_available_studies()
List Available Clinical Data Sources
Description
Returns information about all clinical datasets available locally –
both datasets bundled with the package and any datasets previously
downloaded via download_study(). The location column indicates
whether a dataset is "bundled" (shipped with the package) or
"cached" (downloaded to the user cache directory).
To see datasets available for download from GitHub, use
list_available_studies().
Usage
list_data_sources()
Value
A data frame with columns:
- source
Dataset name (pass to
connect_clinical_data())- description
Human-readable study description
- domains
Comma-separated list of available data domains (e.g.
"adam, sdtm")- format
Storage format (
"parquet")- location
Either
"bundled"or"cached"
Examples
list_data_sources()
Lock all study folders
Description
Locks all study folders under a base path (in-memory).
Usage
lock_all_studies(base_path = "inst/exampledata", reason = "Package installed")
Arguments
base_path |
Base path to the exampledata directory |
reason |
Optional reason for the lock |
Value
Invisible character vector of locked folder paths
Lock a study folder
Description
Marks a study path as locked for the duration of the current R session.
On Unix-like systems, cached study directories are also made read-only
at the file-system level via Sys.chmod().
Usage
lock_study(study_path, reason = "Package installed")
Arguments
study_path |
Path to the study folder |
reason |
Optional reason for the lock (included in messages only) |
Value
Logical indicating success, invisibly
Remove Content with Lock Check
Description
S3 method for remove_cnt that checks if the study folder is locked before allowing remove operations.
Usage
## S3 method for class 'ConnectorLockedFS'
remove_cnt(connector_object, name, ...)
Arguments
connector_object |
The ConnectorLockedFS object |
name |
The file name to remove |
... |
Additional arguments passed to the underlying connector |
Value
Invisible connector_object
Unlock a study folder
Description
Removes the in-memory lock on a study path, allowing write operations for the remainder of the current R session. On Unix-like systems, also restores write permissions on cached study directories.
Usage
unlock_study(study_path)
Arguments
study_path |
Path to the study folder |
Value
Logical indicating success, invisibly
Wrap Connectors with Lock Protection
Description
Recursively wraps all ConnectorFS objects with lock protection.
Usage
wrap_connectors_with_locks(obj, study_path)
Arguments
obj |
A connectors object or connector object |
study_path |
Path to the study folder |
Value
The wrapped object
Write Content with Lock Check
Description
S3 method for write_cnt that checks if the study folder is locked before allowing write operations.
Usage
## S3 method for class 'ConnectorLockedFS'
write_cnt(connector_object, x, name, overwrite = FALSE, ...)
Arguments
connector_object |
The ConnectorLockedFS object |
x |
The data to write |
name |
The file name |
overwrite |
Whether to overwrite existing files |
... |
Additional arguments passed to the underlying connector |
Value
Invisible connector_object