The package includes utilities to reproduce the size and power studies. They are Monte Carlo intensive; the examples below use small replication counts for speed.
Under a true null, a well-calibrated test rejects at approximately the nominal level. The asymptotic reference over-rejects, while the bootstrap restores size.
Power against a set of alternatives, with optional size-correction for a fair comparison: