---
title: "Chapter 02: Using a ported library — assembling kernel programs"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Chapter 02: Using a ported library — assembling kernel programs}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, eval = FALSE}
library(opencltools)
```

# Introduction

OpenCL kernels are not self‑contained: any kernel that calls statistical
math functions (e.g. `dgamma`, `pnorm5`, `dbinom_raw`) depends on
OpenCL‑C ports of those functions to be compiled and loaded alongside it.
`opencltools` provides the tools to manage those dependencies — loading,
sorting, and assembling library shards into a single program string —
without the calling package needing to know the internal structure of the
library it is using.

This chapter uses `nmathopencl` as the concrete worked example.  `nmathopencl`
ships an OpenCL‑C port of R's Mathlib (nmath) as a directory of annotated
`.cl` shards.  The same `opencltools` functions work with any library that
follows the same annotation convention.

For OpenCL setup and device verification, see Chapter 01.

---

# 1. Anatomy of a ported library

[`nmathopencl`](https://knygren.r-universe.dev/nmathopencl) organizes its OpenCL-C source into a top-level configuration
file and seven subdirectories under `inst/cl/`.  The layout mirrors how R
itself layers its headers and runtime — each directory provides exactly what
the directories after it need:

| Load order | Directory / file | Contents |
|:----------:|------------------|----------|
| 1 | `OPENCL.cl` | Global OpenCL configuration: `cl_khr_fp64` / `cl_khr_printf` extensions, IEEE constants (`ML_NAN`, `ML_POSINF`), feature‑detection macros for `expm1`/`log1p` |
| 2 | `libR_shims/` | Host‑runtime compatibility shims: `R_pow`, `R_pow_di`, `R_CheckStack` — the bottom‑most primitive operations that everything above depends on |
| 3 | `R_ext_types/` | Type definitions: `Rboolean`, `FALSE`/`TRUE`, and related type aliases (mirrors R's `Boolean.h` etc.) |
| 4 | `R_shims/` | R configuration and define shims: `Rconfig.cl`, `Rdefines.cl`, `Rinternals.cl` |
| 5 | `R_ext_runtime/` | Memory, error, and I/O interface shims: `error`, `warning`, `Rf_error`, `Rf_warning` — mirrors R's `Error.h`, `Memory.h`, etc. |
| 6 | `R_ext_internals/` | Internal R extension definitions (less commonly needed by nmath directly) |
| 7 | `System/` | System‑level integer type definitions: `int64_t`, `uint64_t`, etc. (replaces `<stdint.h>`) |
| 8 | `nmath/` | The ported Mathlib sources (~137 `.cl` shards including `Rmath.cl`, `nmath.cl`, `dpq.cl`, `refactored.cl`, and all distribution function files) |

The ordering is not arbitrary — it matches the header include chain that
R's `nmath.h` follows.  Each layer provides symbols the next layer assumes
are already defined.  `load_kernel_library` performs a topological sort
*within* each subdirectory; the subdirectories themselves must be loaded
in the order above.

Each `.cl` shard carries annotation comments that `opencltools` reads to
determine within-directory dependency order.

## 1.1 Shard annotations

Every library shard declares what it provides and what it depends on:

```c
// @source_type:  c
// @source_origin: dgamma.c
// @provides:    dgamma
// @depends:     dpois, nmath, dpq
// @all_depends: dpq, Rmath, nmath, stirlerr, ...   (auto-generated)
// @load_order:  99                                  (auto-generated)
```

- **`@provides`** — symbols this file exports (used to build the
  symbol→shard map).
- **`@depends`** — other shards (by stem) this file needs.
- **`@all_depends`** and **`@load_order`** — computed by
  `attach_kernel_dependency_tags()` / `write_kernel_dependency_index()`;
  do not edit manually.

---

# 2. The nmath module

The `nmath/` subdirectory of [`nmathopencl`](https://knygren.r-universe.dev/nmathopencl) contains ~137 ported `.cl` shards.
It includes the anchor files `Rmath.cl` (all distribution function declarations
and mathematical constants), `nmath.cl` (the core Mathlib header macros),
`dpq.cl` (R‑style density/CDF macros for `give_log`/`lower_tail` logic), and
`refactored.cl` (forward declarations needed to break circular dependencies in
the OpenCL single-pass compilation model), plus the individual distribution
function files.

The table below lists the key infrastructure shards and the shards most
relevant for GLM / envelope computations:

| Shard | `@provides` | Purpose |
|-------|-------------|---------|
| `Rmath.cl` | All Rmath constants and all distribution function signatures | Master declaration header — equivalent to R's `Rmath.h` |
| `nmath.cl` | `ML_*`, `ME_*`, `ISNAN`, `R_FINITE`, `ML_ERROR`, and many private helpers | Core Mathlib header macros and private declarations |
| `dpq.cl` | `R_D__0`, `R_DT_val`, `R_D_exp`, … | R‑style density/CDF macros for `give_log` / `lower_tail` logic |
| `refactored.cl` | Forward declarations for cycle‑broken functions | Needed because OpenCL C is single‑pass; breaks mutual call cycles in the nmath graph |
| `log1p.cl` | `log1p`, `Rlog1p` | log(1+x) for numerical stability |
| `expm1.cl` | `expm1` | exp(x)−1 for numerical stability |
| `bd0.cl` | `bd0`, `ebd0` | Poisson/binomial deviance terms |
| `stirlerr.cl` | `stirlerr` | Stirling error; dispatches to the two cycle‑broken fragments below |
| `stirlerr_cycle_free.cl` | `stirlerr_cycle_free` | Table‑lookup path for small arguments (split from `stirlerr.c` to break cycle) |
| `stirlerr_cycle_dependent.cl` | `stirlerr_cycle_dependent` | Series path for large arguments |
| `pgamma_utils.cl` | `log1pmx`, `lgamma1p` | Utilities extracted from `pgamma.c` to break cycle |
| `lgamma.cl` | `lgammafn`, `lgammafn_sign` | Log‑gamma |
| `gamma.cl` | `gammafn` | Gamma function |
| `lgammacor.cl` | `lgammacor` | Series correction for large log-gamma arguments |
| `chebyshev.cl` | `chebyshev_init`, `chebyshev_eval` | Called by `lgammacor` |
| `cospi.cl` | `cospi`, `sinpi`, `tanpi` | `sinpi` used in the negative-argument reflection formula for `gammafn` |
| `fmax2.cl` / `fmin2.cl` | `fmax2`, `fmin2` | Called by `gammalims` and others |
| `gammalims.cl` | `gammalims` | Gamma function overflow/underflow bounds |
| `dbinom.cl` | `dbinom`, `dbinom_raw`, `pow1p` | Binomial density |
| `dpois.cl` | `dpois`, `dpois_raw` | Poisson density |
| `dgamma.cl` | `dgamma` | Gamma density |
| `dnorm.cl` | `dnorm`, `dnorm4` | Normal density |
| `pnorm.cl` | `pnorm`, `pnorm_both`, `pnorm5` | Normal CDF (probit) |
| `pgamma.cl` | `pgamma` | Gamma CDF |

`pnorm.cl` and `dnorm.cl` are self‑contained: their only dependencies are
the `nmath.cl` infrastructure shim and the `dpq`‑style macros — no additional
math files are pulled in.  `dbinom.cl` and `dgamma.cl` share almost the entire
gamma function stack (`lgamma`, `gamma`, `lgammacor`, `chebyshev`, `cospi`,
`gammalims`, `fmax2`, `pgamma_utils`, `stirlerr`, `bd0`).

These ports ensure that OpenCL kernels produce results **numerically
consistent** with R's CPU path.

---

# 3. Loading kernel source files

## 3.1 Single file

`load_kernel_source()` loads one `.cl` file from a package's `inst/cl/`
directory and returns its contents as a character string:

```{r, eval = FALSE}
# Load the global configuration preamble
opencl_cl <- load_kernel_source("OPENCL.cl", package = "nmathopencl")

# Load a specific kernel
f2_src <- load_kernel_source("src/f2_f3_gaussian.cl", package = "nmathopencl")
```

The returned string is ready to concatenate or pass directly to
`clCreateProgramWithSource`.

## 3.2 A whole library directory

`load_kernel_library()` loads **all** `.cl` files in a subdirectory,
performs a **dependency-aware topological sort** (using `@depends`
annotations), and returns their contents concatenated in the correct load
order — so that every shard appears after all the shards it depends on:

```{r, eval = FALSE}
# Load one of nmathopencl's dependency layers
nmath_src <- load_kernel_library("nmath", package = "nmathopencl")
```

Each of `nmathopencl`'s seven subdirectories is a separate call.  Together
they are the building blocks of a complete OpenCL program string (see §4).

---

# 4. Program assembly

An OpenCL program that calls nmath functions must include all dependency
layers in a fixed order.  The order is not arbitrary — it mirrors the header
include chain that R's `nmath.h` follows: each layer defines symbols the next
layer assumes are already visible.

The canonical assembly sequence used by `glmbayes` is:

```
program_source =
    OPENCL.cl                                    # 1. global config, extensions, macros
  + libR_shims   (load_kernel_library("libR_shims"))    # 2. R_pow, R_pow_di, R_CheckStack
  + R_ext_types  (load_kernel_library("R_ext_types"))   # 3. Rboolean, TRUE/FALSE, type aliases
  + R_shims      (load_kernel_library("R_shims"))       # 4. Rconfig, Rdefines, Rinternals shims
  + R_ext_runtime(load_kernel_library("R_ext_runtime")) # 5. error, warning, memory shims
  + R_ext_internals(load_kernel_library("R_ext_internals")) # 6. internal R extension defs
  + System       (load_kernel_library("System"))        # 7. int64_t, uint64_t (stdint shim)
  + nmath        (load_kernel_library("nmath"))         # 8. all ported Mathlib (~137 shards)
  + kernel file  (load_kernel_source("src/..."))        # 9. your model-specific kernel
```

Steps 2–8 together constitute the **nmathopencl layer**: they make every
nmath function — `lgamma`, `lbeta`, `dbinom`, `dpois`, `pnorm`, and so on —
available as device‑side functions inside the kernel at step 9.

In C++ inside a kernel runner (the pattern used by `glmbayes`):

```cpp
std::string all_src =
    load_kernel_source("OPENCL.cl")
  + "\n" + load_kernel_library("libR_shims",      "nmathopencl")
  + "\n" + load_kernel_library("R_ext_types",     "nmathopencl")
  + "\n" + load_kernel_library("R_shims",         "nmathopencl")
  + "\n" + load_kernel_library("R_ext_runtime",   "nmathopencl")
  + "\n" + load_kernel_library("R_ext_internals", "nmathopencl")
  + "\n" + load_kernel_library("System",          "nmathopencl")
  + "\n" + load_kernel_library("nmath",           "nmathopencl")
  + "\n" + load_kernel_source("src/f2_f3_gaussian.cl");
```

Or equivalently in R:

```{r, eval = FALSE}
library(opencltools)
pkg <- "nmathopencl"

all_src <- paste(
  load_kernel_source("OPENCL.cl"),
  load_kernel_library("libR_shims",      package = pkg),
  load_kernel_library("R_ext_types",     package = pkg),
  load_kernel_library("R_shims",         package = pkg),
  load_kernel_library("R_ext_runtime",   package = pkg),
  load_kernel_library("R_ext_internals", package = pkg),
  load_kernel_library("System",          package = pkg),
  load_kernel_library("nmath",           package = pkg),
  load_kernel_source("src/f2_f3_gaussian.cl", package = pkg),
  sep = "\n"
)
```

The resulting `all_src` string is passed to `clCreateProgramWithSource`.
`load_kernel_library` performs a topological sort within each subdirectory
so you do not need to enumerate individual files — you only nominate a
subdirectory name.

---

# 5. Minimal subsetting — loading only the nmath shards a kernel needs

The full assembly in §4 loads all ~137 nmath shards.  For performance it is
better to include only the shards a specific kernel actually requires — this
reduces the program source size and cuts first-call just-in-time (JIT) compilation time.

`load_library_for_kernel()` reads the `@all_depends_nmath` annotation from
a kernel file (written by `attach_cross_library_tags()`, see §6) and returns
the source of exactly those nmath shards, in dependency order:

```{r, eval = FALSE}
nmath_dir   <- system.file("cl/nmath", package = "nmathopencl")
kernel_path <- system.file("cl/src/f2_f3_gaussian.cl", package = "nmathopencl")

# Returns only the nmath shards the gaussian kernel needs (dnorm + its deps)
nmath_subset <- load_library_for_kernel(
  kernel_path,
  library_dir = nmath_dir,
  depends_tag = "all_depends_nmath"
)
# Warnings fire automatically for any stems with known portability issues
```

Substitute this result for the full `load_kernel_library("nmath", ...)` call
in §4 while keeping all other layers unchanged — the prelude layers
(`libR_shims`, `R_ext_types`, etc.) are always loaded in full.

This function is implemented in C++ and exposed to R as a convenience
wrapper, so it can be called from `*.cpp` inside a kernel runner without
going back into R.

---

# 6. Annotating your kernel files

When you write a new kernel that calls nmath (or any other ported library),
two `opencltools` functions handle all the dependency annotation — no manual
tagging is needed beyond one line declaring which library you use.

## 6.1 Declare the library (one line, written by you)

Add a single comment at the top of your kernel file:

```c
// @library_deps: nmath
__kernel void my_kernel(...) {
  double v = dgamma(x[j], shape, scale, 0);
  ...
}
```

## 6.2 Step 1 — scan source, write direct-call tags

`attach_kernel_call_tags()` builds the library's symbol→shard map from
`@provides` annotations, scans your kernel source for matching calls,
and writes `@calls_nmath` and `@depends_nmath` automatically:

```{r, eval = FALSE}
nmath_dir <- system.file("cl/nmath", package = "nmathopencl")

attach_kernel_call_tags(
  kernel_paths = list.files("inst/cl/src", "\\.cl$", full.names = TRUE),
  library_dir  = nmath_dir,
  library_tag  = "nmath"
)
```

After this step your kernel file contains:

```c
// @library_deps: nmath
// @calls_nmath: dgamma
// @depends_nmath: dgamma
// @calls_opencl_builtin: (none)
```

## 6.3 Step 2 — compute transitive closure

`attach_cross_library_tags()` reads `@depends_nmath`, walks the
pre-built dependency index, and writes `@all_depends_nmath` — the
complete ordered list of every shard needed:

```{r, eval = FALSE}
attach_cross_library_tags(
  kernel_paths = list.files("inst/cl/src", "\\.cl$", full.names = TRUE),
  library_dir  = nmath_dir,
  depends_tag  = "depends_nmath"
)
```

After this step the kernel file contains the full annotation block:

```c
// @library_deps: nmath
// @calls_nmath: dgamma
// @depends_nmath: dgamma
// @calls_opencl_builtin: (none)
// @all_depends_nmath_count: 19
// @all_depends_nmath: dpq, Rmath, nmath, stirlerr_cycle_free, chebyshev, ...
```

Re-run both steps after editing your kernel and adding or removing
library calls.

---

# 7. Verifying portability

Before wiring a kernel into production code you can verify that every
function it needs has been ported and is unlikely to cause problems.
`opencltools` maintains a curated `opencl_known_failures.json` and surfaces
warnings automatically when you call `load_library_for_kernel()`.  The
warnings identify stems with known portability issues so you can decide
whether to proceed or find an alternative.

---

# 8. Maintaining a library index

If you build or update a library of annotated `.cl` shards (e.g. after
adding new ported functions), regenerate the dependency index:

```{r, eval = FALSE}
write_kernel_dependency_index(
  library_dir = "inst/cl/nmath",
  output_dir  = "inst/cl/nmath"
)
```

The resulting `kernel_dependency_index.rds` is read by
`load_library_for_kernel()` and `attach_cross_library_tags()`.  It is
pre-built for `nmathopencl` and does not need to be regenerated unless you
fork or extend the library.

---

# Cross-references

- **Chapter 01** — OpenCL installation and device verification
- **Chapter 03** — Kernel design pattern: the runner/wrapper approach
- **`?load_kernel_source`**, **`?load_kernel_library`** — loading functions
- **`?load_library_for_kernel`** — minimal subsetting
- **`?attach_kernel_call_tags`** — source-scanning annotation (Step 1)
- **`?attach_cross_library_tags`** — transitive closure annotation (Step 2)
- **`?write_kernel_dependency_index`** — index maintenance
