---
title: "Validation: is your power number trustworthy?"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Validation: is your power number trustworthy?}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
set.seed(1)
```

A power estimate is only as trustworthy as the test it is built on. If a
design/analysis combination does not control its Type I error rate, a high
"power" number is meaningless --- the test is just rejecting too often, under the
null as well as the alternative. mixpower makes this check a first-class step.

```{r}
library(mixpower)
```

## Calibrate: does the test hold its alpha?

`mp_calibrate()` runs the scenario under the null (the focal effect set to zero)
and estimates the empirical Type I error rate, with an exact interval and a
verdict. For a correctly specified model it should sit near `alpha`.

```{r}
d <- mp_design(clusters = list(subject = 24), trials_per_cell = 6)
a <- mp_assumptions(
  fixed_effects = list(`(Intercept)` = 0, condition = 0.4),
  random_effects = list(subject = list(intercept_sd = 0.5)),
  residual_sd = 1
)
scn <- mp_scenario_lme4(y ~ condition + (1 | subject), design = d, assumptions = a)

mp_calibrate(scn, nsim = 60, seed = 11)
```

### Catching a misspecified model

The classic trap: the data have a by-subject random slope, but the analysis
model omits it. This inflates Type I error (Barr et al., 2013). `mp_calibrate()`
flags it.

```{r}
a_slope <- mp_assumptions(
  fixed_effects = list(`(Intercept)` = 0, condition = 0.4),
  random_effects = list(subject = list(
    intercept_sd = 0.5, slopes = list(condition = 0.8)
  )),
  residual_sd = 1
)
# Data have the slope; the fitted model (1 | subject) ignores it.
scn_mis <- mp_scenario_lme4(y ~ condition + (1 | subject), design = d, assumptions = a_slope)

mp_calibrate(scn_mis, nsim = 60, seed = 7)
```

The verdict turns to `anti-conservative`: any power computed from this model
would be inflated. The fix is to fit the maximal model `(1 + condition |
subject)`, which mixpower simulates and tests directly.

## Recommend: which inference method?

Wald (z/t) tests are anti-conservative when the number of clusters is small,
because their degrees of freedom are overstated (Luke, 2017).
`mp_recommend_method()` gives fast, design-based guidance.

```{r}
scn_few <- mp_scenario_lme4(
  y ~ condition + (1 | subject),
  design = mp_design(list(subject = 12), trials_per_cell = 8),
  assumptions = a
)
mp_recommend_method(scn_few)
```

With few clusters it steers you toward a degrees-of-freedom-corrected test
(Satterthwaite or Kenward-Roger for LMMs) or a parametric bootstrap. You can
then *measure* the improvement with `mp_calibrate()`:

```{r, eval = requireNamespace("pbkrtest", quietly = TRUE) && requireNamespace("lmerTest", quietly = TRUE)}
scn_kr <- mp_scenario_lme4(
  y ~ condition + (1 | subject),
  design = mp_design(list(subject = 12), trials_per_cell = 8),
  assumptions = a, test_method = "kenward-roger"
)
mp_calibrate(scn_kr, nsim = 40, seed = 3)$type1
```

## Reporting

Calibration results drop into the same reporting layer as power results:

```{r}
mp_report_table(mp_calibrate(scn, nsim = 40, seed = 1))
```

Run `mp_calibrate()` whenever you change the design size or the analysis model.
A trustworthy power analysis reports both the power and the calibration that
backs it.