| Version: | 3.1.9 |
| Date: | 2026-1-28 |
| Title: | Two One-Sided Tests for Equivalence |
| Imports: | combinat, Hmisc, index0, lm.beta, mathjaxr, rlang, stringr, webuse |
| Depends: | R (≥ 3.5.0) |
| RdMacros: | mathjaxr |
| BuildManual: | TRUE |
| Description: | Ports the 'Stata' ado package 'tost' which provides a suite of commands to perform two one-sided tests for equivalence following the approach by Schuirman (1987) <doi:10.1007/BF01068419>. Commands are provided for t tests on means, z tests on proportions, McNemar's test (1947) <doi:10.1007/BF02295996> on proportions and related tests, tests on the regression coefficients from OLS linear regression (not yet implementing all of the current regression options from the 'Stata' 'tostregress' command, e.g., survey regression options, estimation options, etc.), Wilcoxon's (1945) <doi:10.2307/3001968> signed rank tests, Wilcoxon-Mann-Whitney (1947) <doi:10.1214/aoms/1177730491> rank sum tests, supporting inference about equivalence for a number of paired and unpaired, parametric and nonparametric study designs and data types. Each command tests a null hypothesis that samples were drawn from populations different by at least plus or minus some researcher-defined level of tolerance, which can be defined in terms of units of the data or rank units (Delta), or in units of the test statistic's distribution (epsilon) except for tost.rrp() and tost.rrpi(). Enough evidence rejects this null hypothesis in favor of equivalence within the tolerance. Equivalence intervals for all tests may be defined symmetrically or asymmetrically. |
| License: | GPL-2 |
| LazyData: | no |
| Encoding: | UTF-8 |
| NeedsCompilation: | no |
| Packaged: | 2026-02-06 20:32:33 UTC; alexis |
| Author: | Alexis Dinno |
| Maintainer: | Alexis Dinno <alexis.dinno@pdx.edu> |
| Repository: | CRAN |
| Date/Publication: | 2026-02-09 20:10:02 UTC |
Health Protection Branch of Canada equivalence trial for a generic drug
Description
Example of doctor evaluation of two different drugs—one a test drug, and one a refrence drug—as either “effective” or “ineffective” as described on page 276 of Tu (1997).
Usage
data(canada)
Format
A data frame containing two binary variables, drug, where 0 means “Ineffective” and 1 means “Effective” and group, where 1 means “Test drug” and 2 means “reference drug” in 201 observations.
References
Tu, D. (1997) Two one-sided tests procedures in establishing therapeutic equivalence with binary clinical endpoints: Fixed sample performances and sample size determination. Journal of Statistical Computing and Simmulation 59, 271–290.
Outcomes of an HIV screening test
Description
Example of two different tests—one from a blood plasma sample, and one from an alternate body fluid sample, neither being a ‘gold standard’ test—giving HIV positive and HIV negative status based on research by Lachenbruch and Lynch (1998).
Usage
data(hivfluid)
Format
A data frame containing two binary variables, plasma and altenrate, where 1 means “HIV Positive” and 1 means “HIV Negative” in 1157 observations.
References
Lachenbruch, P. A. and Lynch, C. J. (1998) Assessing screening tests: Extensions of McNemar's test. Statistics In Medicine 17, 2207–2217.
Paired z test for equivalence of marginal probabilities in binary data
Description
Performs two one-sided z tests for equivalence of marginal probabilities in binary data
Usage
tost.mcc(
x = NA,
y = NA,
frequency = NA,
eqv.type = equivalence.types,
eqv.level = 1,
upper = NA,
ccontinuity = continuity.correction.methods,
conf.level = 0.95,
relevance = TRUE)
equivalence.types
#c("delta", "epsilon")
continuity.correction.methods
#c("none", "yates", "edwards")
Arguments
x |
a (non-empty) vector of binary data values of equal length to |
y |
a (non-empty) vector of binary data values of equal length to |
frequency |
an optional (non-empty) vector of equal length to |
eqv.type |
defines whether the equivalence interval will be defined in terms of \(\Delta\) or \(\varepsilon\) ( |
eqv.level |
defines the equivalence threshold for the tests depending on whether |
upper |
defines the upper equivalence threshold for the test, is assumed to be positive, and transforms the meaning of |
conf.level |
confidence level of the interval, and complement of the test's nominal type I error rate \(\alpha\). |
ccontinuity |
calculates test statistics for both positivist and negativist tests using a continuity correction. The default is |
relevance |
reports results and inference for combined tests for difference and for equivalence for a specific |
Details
tost.mcc tests for equivalence of the marginal probabilities of exposure in matched case-control data. It calculates a Wald-type asymptotic \(z\) test (Liu, et al., 2002) in a two one-sided tests approach (Schuirmann, 1987). tost.mcci is the immediate form of tost.mcc. Typically the null hypotheses of the corresponding McNemar's \(\chi^{2}\) test (McNemar, 1947) for difference in marginal probabilities are framed from an assumption of equality of marginal probability of exposure between cases and controls (e.g., \(\text{H}^{+}_{0}: \frac{b}{n} - \frac{c}{n} = 0\), rejecting this assumption only with sufficient evidence. When performing tests for equivalence of marginal probabilities, the null hypothesis is framed as the difference in marginal probabilities is at least as much as the equivalence interval as defined by some chosen level of tolerance (as specified by eqv.type and eqv.level).
With respect to a \(z\) test, a negativist null hypothesis takes one of the following two forms depending on whether tolerance is defined in terms of \(\Delta\) (equivalence expressed in the units of the marginal probability of counts of discordant pairs) or in terms of \(\varepsilon\) (equivalence expressed in the units of the \(z\) distribution):
\(\phantom{22}\text{H}_{0}^{-}\text{: }\left|\frac{b}{n} - \frac{c}{n}\right| \ge \Delta\),
\(\phantom{22}\)where the equivalence interval ranges from \(\left(\frac{b}{n} - \frac{c}{n}\right) - \Delta\) to \(\left(\frac{b}{n} - \frac{c}{n}\right) + \Delta\), and where \(b\) is the count of pairs with cases exposed, but controls unexposed, and and \(c\) is the count of pairs with cases unexposed and controls exposed. This translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{ H}_{01}^{-}\text{: }\frac{b}{n} - \frac{c}{n} \ge \Delta\), or
\(\phantom{2222}\text{ H}_{02}^{-}\text{: }\frac{b}{n} - \frac{c}{n} \le -\Delta\).
–OR–
\(\phantom{22}\text{H}_{0}^{-}\text{: }|Z| \ge \varepsilon ,\)
\(\phantom{22}\)where the equivalence interval ranges from \(-\varepsilon\) to \(\varepsilon\). This also translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }Z \ge \varepsilon\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }Z \le -\varepsilon\).
When an asymmetric equivalence interval is defined using the upper option the general negativist null hypothesis becomes:
\(\phantom{22}\text{H}_{0}^{-}\text{: }\frac{b}{n} - \frac{c}{n} \le \Delta_{\text{lower}}\), or \(\frac{b}{n} - \frac{c}{n} \ge \Delta_{\text{upper}}\)
\(\phantom{22}\)where the equivalence interval ranges from \(\left(\frac{b}{n} - \frac{c}{n}\right) + \Delta_{\text{lower}}\) to \(\left(\frac{b}{n} - \frac{c}{n}\right) + \Delta_{\text{upper}}\). This also translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }\frac{b}{n} - \frac{c}{n} \ge \Delta_{\text{upper}}\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }\frac{b}{n} - \frac{c}{n} \le \Delta_{\text{lower}}\).
–OR–
\(\phantom{22}\text{H}_{0}^{-}\text{: }Z \le \varepsilon_{\text{lower}}\), or \(Z \ge \varepsilon_{\text{upper}}\), with:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }Z \ge \varepsilon_{\text{upper}}\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }Z \le \varepsilon_{\text{lower}}\).
NOTE: the appropriate level of \(\alpha = (1 - \)conf.level\()\) is precisely the same as in the corresponding two-sided test for mean difference, so that, for example, if one wishes to make a type I error %1 of the time, one simply conducts both of the one-sided tests of \(\text{H}_{01}^{-}\) and \(\text{H}_{02}^{-}\) by comparing the resulting p-value to 0.01 (Tryon and Lewis, 2008; Wellek, 2010).
Remarks
As described by Tryon and Lewis (2008), when rejection decisions from both tests for difference (e.g., \(\text{H}_{0}^{+}\text{: }\frac{b}{n} - \frac{c}{n} = 0\) or \(\text{H}^{+}_{0}\text{: Z = 0}\)) and tests for equivalence (e.g., either \(\text{H}_{0}^{-}\text{: }\left|\frac{b}{n} - \frac{c}{n}\right| \ge \Delta\), or \(\text{H}_{0}^{-}\text{: }|Z| \ge \varepsilon\)) are combined, there are four possible interpretations for a given \(\alpha\) and \(\Delta\) or \(\varepsilon\):
One may reject \(\text{H}_{0}^{+}\), but fail to reject \(\text{H}_{0}^{-}\), and conclude that there is a relevant difference in marginal proportions at least as large as \(\Delta\) or \(\varepsilon\).
One may fail to reject \(\text{H}_{0}^{+}\), but reject \(\text{H}_{0}^{-}\), and conclude that there is equivalence in marjinal proportions within the equivalence range (i.e. defined by \(\Delta\) or \(\varepsilon\)).
One may reject both \(\text{H}_{0}^{+}\) and \(\text{H}_{0}^{-}\), and conclude that there is a trivial difference in marjinal proportions which lies within the equivalence range (i.e. defined by \(\Delta\) or \(\varepsilon\)).
One may fail to reject both \(\text{H}_{0}^{+}\) and \(\text{H}_{0}^{-}\), and draw an indeterminate conclusion, because the data are underpowered to detect either difference or equivalence.
Value
tost.mcc returns:
statistics |
a vector containing the value of \(z_{1}\) and \(z_{2}\); if |
p.values |
a vector of p values for the z tests. |
estimate |
the estimated difference in proportion with exposure. |
threshold |
a scalar containing the equivalence threshold when |
conclusion |
relevance test conclusion for a given \(\alpha\) and \(\Delta\) or \(\varepsilon\). |
Author(s)
Alexis Dinno (alexis.dinno@pdx.edu)
Please contact me with any questions, bug reports or suggestions for improvement. Fixing bugs will be facilitated by sending along:
a copy of the data (de-labeled or anonymized is fine),
a copy of the command syntax used, and
a copy of the exact output of the command.
Suggested citation
Dinno, A. 2025. tost.mcc: Paired z test for equivalence of marginal probabilities in binary data. In: tost.suite R software package. URL: https://alexisdinno.com/Software/index.shtml#tost
References
Edwards, A. (1948) Note on the “correction for continuity” in testing the significance of the difference between correlated proportions. Psychometrika 13, 185–187.
Liu, J., et al., (2002) Tests for equivalence or non-inferiority for paired binary data. Statistics In Medicine 21, 231–245.
McNemar, Q. (1947) Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12, 153–157
Schuirmann, D. A. (1987) A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics. 15, 657–680.
Tryon, W. W., and C. Lewis. (2008) An inferential confidence interval method of establishing statistical equivalence that corrects Tryon's (2001) reduction factor. Psychological Methods. 13, 272–277
Yates, F. (1934) Contingency tables involving small numbers and the \(\chi^2\) test. Supplement to the Journal of the Royal Statistical Society. 1, 217–235.
Wellek, S. (2010) Testing Statistical Hypotheses of Equivalence and Noninferiority, second edition. Chapman and Hall/CRC Press. p. 31
See Also
Examples
require("webuse")
# Setup
webuse("mccxmpl")
# Relevance test in paired binary data
tost.mcc(
x=mccxmpl$case,
y=mccxmpl$control,
frequency=mccxmpl$pop,
eqv.type="delta",
eqv.level=.2,
relevance=TRUE)
Immediate paired z test for equivalence of marginal probabilities in binary data
Description
Immediately performs two one-sided z tests for equivalence of marginal probabilities in binary data
Usage
tost.mcci(
a = NA, b = NA, c = NA, d = NA,
eqv.type = equivalence.types,
eqv.level = 1,
upper = NA,
ccontinuity = continuity.correction.methods,
conf.level = 0.95,
relevance = TRUE)
equivalence.types
#c("delta", "epsilon")
continuity.correction.methods
#c("none", "yates", "edwards")
Arguments
a |
a non-negative integer indicating the number of paired observations with both cases and controls exposed. |
b |
a non-negative integer indicating the number of paired observations with cases exposed and controls unexposed. |
c |
a non-negative integer indicating the number of paired observations with cases unexposed and controls exposed. |
d |
a non-negative integer indicating the number of paired observations with both cases and controls unexposed. |
eqv.type |
defines whether the equivalence interval will be defined in terms of \(\Delta\) or \(\varepsilon\) ( |
eqv.level |
defines the equivalence threshold for the tests depending on whether |
upper |
defines the upper equivalence threshold for the test, is assumed to be positive, and transforms the meaning of |
conf.level |
confidence level of the interval, and complement of the test's nominal type I error rate \(\alpha\). |
ccontinuity |
calculates test statistics for both positivist and negativist tests using a continuity correction. The default is |
relevance |
reports results and inference for combined tests for difference and for equivalence for a specific |
Details
Immediate commands perfom tests given summary statistics, rather than given data. tost.mcci tests for equivalence of the marginal probabilities of exposure in matched case-control data. It calculates a Wald-type asymptotic \(z\) test (Liu, et al., 2002) in a two one-sided tests approach (Schuirmann, 1987). tost.mcc is the non-immediate form of tost.mcci. Typically the null hypotheses of the corresponding McNemar's \(\chi^{2}\) test (McNemar, 1947) for difference in marginal probabilities are framed from an assumption of equality of marginal probability of exposure between cases and controls (e.g., \(\text{H}^{+}_{0}: \frac{b}{n} - \frac{c}{n} = 0\), rejecting this assumption only with sufficient evidence. When performing tests for equivalence of marginal probabilities, the null hypothesis is framed as the difference in marginal probabilities is at least as much as the equivalence interval as defined by some chosen level of tolerance (as specified by eqv.type and eqv.level).
With respect to a \(z\) test, a negativist null hypothesis takes one of the following two forms depending on whether tolerance is defined in terms of \(\Delta\) (equivalence expressed in the units of the marginal probability of counts of discordant pairs) or in terms of \(\varepsilon\) (equivalence expressed in the units of the \(z\) distribution):
\(\phantom{22}\text{H}_{0}^{-}\text{: }\left|\frac{b}{n} - \frac{c}{n}\right| \ge \Delta\),
\(\phantom{22}\)where the equivalence interval ranges from \(\left(\frac{b}{n} - \frac{c}{n}\right) - \Delta\) to \(\left(\frac{b}{n} - \frac{c}{n}\right) + \Delta\), and where \(b\) is the count of pairs with cases exposed, but controls unexposed, and and \(c\) is the count of pairs with cases unexposed and controls exposed. This translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{ H}_{01}^{-}\text{: }\frac{b}{n} - \frac{c}{n} \ge \Delta\), or
\(\phantom{2222}\text{ H}_{02}^{-}\text{: }\frac{b}{n} - \frac{c}{n} \le -\Delta\).
–OR–
\(\phantom{22}\text{H}_{0}^{-}\text{: }|Z| \ge \varepsilon ,\)
\(\phantom{22}\)where the equivalence interval ranges from \(-\varepsilon\) to \(\varepsilon\). This also translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }Z \ge \varepsilon\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }Z \le -\varepsilon\).
When an asymmetric equivalence interval is defined using the upper option the general negativist null hypothesis becomes:
\(\phantom{22}\text{H}_{0}^{-}\text{: }\frac{b}{n} - \frac{c}{n} \le \Delta_{\text{lower}}\), or \(\frac{b}{n} - \frac{c}{n} \ge \Delta_{\text{upper}}\)
\(\phantom{22}\)where the equivalence interval ranges from \(\left(\frac{b}{n} - \frac{c}{n}\right) + \Delta_{\text{lower}}\) to \(\left(\frac{b}{n} - \frac{c}{n}\right) + \Delta_{\text{upper}}\). This also translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }\frac{b}{n} - \frac{c}{n} \ge \Delta_{\text{upper}}\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }\frac{b}{n} - \frac{c}{n} \le \Delta_{\text{lower}}\).
–OR–
\(\phantom{22}\text{H}_{0}^{-}\text{: }Z \le \varepsilon_{\text{lower}}\), or \(Z \ge \varepsilon_{\text{upper}}\), with:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }Z \ge \varepsilon_{\text{upper}}\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }Z \le \varepsilon_{\text{lower}}\).
NOTE: the appropriate level of \(\alpha = (1 - \)conf.level\()\) is precisely the same as in the corresponding two-sided test for mean difference, so that, for example, if one wishes to make a type I error %1 of the time, one simply conducts both of the one-sided tests of \(\text{H}_{01}^{-}\) and \(\text{H}_{02}^{-}\) by comparing the resulting p-value to 0.01 (Tryon and Lewis, 2008; Wellek, 2010).
Remarks
As described by Tryon and Lewis (2008), when rejection decisions from both tests for difference (e.g., \(\text{H}_{0}^{+}\text{: }\frac{b}{n} - \frac{c}{n} = 0\) or \(\text{H}^{+}_{0}\text{: Z = 0}\)) and tests for equivalence (e.g., either \(\text{H}_{0}^{-}\text{: }\left|\frac{b}{n} - \frac{c}{n}\right| \ge \Delta\), or \(\text{H}_{0}^{-}\text{: }|Z| \ge \varepsilon\)) are combined, there are four possible interpretations for a given \(\alpha\) and \(\Delta\) or \(\varepsilon\):
One may reject \(\text{H}_{0}^{+}\), but fail to reject \(\text{H}_{0}^{-}\), and conclude that there is a relevant difference in marginal proportions at least as large as \(\Delta\) or \(\varepsilon\).
One may fail to reject \(\text{H}_{0}^{+}\), but reject \(\text{H}_{0}^{-}\), and conclude that there is equivalence in marjinal proportions within the equivalence range (i.e. defined by \(\Delta\) or \(\varepsilon\)).
One may reject both \(\text{H}_{0}^{+}\) and \(\text{H}_{0}^{-}\), and conclude that there is a trivial difference in marjinal proportions which lies within the equivalence range (i.e. defined by \(\Delta\) or \(\varepsilon\)).
One may fail to reject both \(\text{H}_{0}^{+}\) and \(\text{H}_{0}^{-}\), and draw an indeterminate conclusion, because the data are underpowered to detect either difference or equivalence.
Value
tost.mcci returns:
statistics |
a vector containing the value of \(z_{1}\) and \(z_{2}\); if |
p.values |
a vector of p values for the z tests. |
estimate |
the estimated difference in proportion with exposure. |
threshold |
a scalar containing the equivalence threshold when |
conclusion |
relevance test conclusion for a given \(\alpha\) and \(\Delta\) or \(\varepsilon\). |
Author(s)
Alexis Dinno (alexis.dinno@pdx.edu)
Please contact me with any questions, bug reports or suggestions for improvement. Fixing bugs will be facilitated by sending along:
a copy of the data (de-labeled or anonymized is fine),
a copy of the command syntax used, and
a copy of the exact output of the command.
Suggested citation
Dinno, A. 2025. tost.mcci: Paired z test for equivalence of marginal probabilities in binary data. In: tost.suite R software package.
References
Edwards, A. (1948) Note on the “correction for continuity” in testing the significance of the difference between correlated proportions. Psychometrika 13, 185–187.
Liu, J., et al., (2002) Tests for equivalence or non-inferiority for paired binary data. Statistics In Medicine 21, 231–245.
McNemar, Q. (1947) Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12, 153–157
Schuirmann, D. A. (1987) A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics. 15, 657–680.
Tryon, W. W., and C. Lewis. (2008) An inferential confidence interval method of establishing statistical equivalence that corrects Tryon's (2001) reduction factor. Psychological Methods. 13, 272–277
Yates, F. (1934) Contingency tables involving small numbers and the \(\chi^2\) test. Supplement to the Journal of the Royal Statistical Society. 1, 217–235.
Wellek, S. (2010) Testing Statistical Hypotheses of Equivalence and Noninferiority, second edition. Chapman and Hall/CRC Press. p. 31
See Also
Examples
# Immediate command for the relevance test in paired binary data in the help file
# for tost.mcc
tost.mcci(
a=8, b=8, c=3, d=8,
eqv.type="delta",
eqv.level=.2,
relevance=TRUE)
# Different example with an asymetric interval; the lower end of the equivalence
# interval = qnorm(.95)+.5 = 2.144854 meaning equivalence must lay no more
# than 0.5 sd beyond the critical value of Z for alpha = 0.05. The upper end of
# the equivalence interval = qnorm(.95)+1 = 2.644854 meaning equivalence
# must lay no more than 1 sd beyond the critical value of Z for alpha = 0.05.
tost.mcci(
a=4, b=9, c=8, d=5,
eqv.type="epsilon",
eqv.level=qnorm(.95)+.5,
upper=qnorm(.95)+1,
relevance=TRUE)
Mean-equivalence z tests
Description
Performs two one-sided z tests for mean equivalence
Usage
tost.pr(
x,
y = NULL,
by = NULL,
by.names = NULL,
p0 = NA,
eqv.type = equivalence.types,
eqv.level = 1,
upper = NA,
ccontinuity = continuity.correction.methods,
conf.level = 0.95,
x.name = "",
y.name = "",
relevance = TRUE)
equivalence.types
#c("delta", "epsilon")
continuity.correction.methods
#c("none", "yates", "ha")
Arguments
x |
a (non-empty) vector of binary data values. |
y |
an optional (non-empty) vector of binary data values. |
by |
an optional (non-empty) vector of group indicator values |
by.names |
an optional two-element character vector of group names. If none are supplied, the values of |
p0 |
a number indicating the true value of the proportion for a one-sample test. Implies |
eqv.type |
defines whether the equivalence interval will be defined in terms of \(\Delta\) or \(\varepsilon\) ( |
eqv.level |
defines the equivalence threshold for the tests depending on whether |
upper |
defines the upper equivalence threshold for the test, is assumed to be positive, and transforms the meaning of |
conf.level |
confidence level of the interval, and complement of the test's nominal type I error rate \(\alpha\). |
x.name |
specifies how the first variable will be labeled in the output. The default value of |
y.name |
specifies how the second variable will be labeled in the output. The default value of |
ccontinuity |
calculates test statistics for both positivist and negativist tests using a continuity correction. The default is |
relevance |
reports results and inference for combined tests for difference and for equivalence for a specific |
Details
tost.pr tests for the equivalence of proportions within a symmetric equivalence interval defined by eqvtype and eqvlevel (or within an asymmetric interval when adding the upper argument) using a two one-sided z tests (TOST) approach (Schuirmann, 1987). Typically “positivist” null hypotheses are framed from an assumption of a lack of difference between two quantities, and reject this assumption only with sufficient evidence. When performing tests for equivalence, one frames a null hypothesis with the assumption that two quantities are different within an equivalence interval defined by some chosen level of tolerance.
With respect to an unpaired z test, an equivalence null hypothesis takes one of the following two forms depending on whether equivalence is defined in terms of \(\Delta\) (equivalence expressed in the same units as proportions of the x and y variables) or in terms of \(\varepsilon\) (equivalence expressed in the units of the z distribution with the given degrees of freedom):
\(\phantom{22}\text{H}_{0}^{-}\text{: }|p_{x} - p_y| \ge \Delta\),
\(\phantom{22}\)where the equivalence interval ranges from \(\left(p_x - p_y\right) - \Delta\) to \(\left(p_x - p_y\right) + \Delta\). This translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{ H}_{01}^{-}\text{: }p_{x} - p_y \ge \Delta\), or
\(\phantom{2222}\text{ H}_{02}^{-}\text{: }p_{x} - p_y \le -\Delta\).
–OR–
\(\phantom{22}\text{H}_{0}^{-}\text{: }|Z| \ge \varepsilon ,\)
\(\phantom{22}\)where the equivalence interval ranges from \(-\varepsilon\) to \(\varepsilon\). This also translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }Z \ge \varepsilon\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }Z \le -\varepsilon\).
When an asymmetric equivalence interval is defined using the upper option the general negativist null hypothesis becomes:
\(\phantom{22}\text{H}_{0}^{-}\text{: }p_{x} - p_y \le \Delta_{\text{lower}}\), or \(p_{x} - p_y \ge \Delta_{\text{upper}}\)
\(\phantom{22}\)where the equivalence interval ranges from \(\left(p_x - p_y\right) + \Delta_{\text{lower}}\) to \(\left(p_x - p_y\right) + \Delta_{\text{upper}}\). This also translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }p_x - p_y \ge \Delta_{\text{upper}}\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }p_x - p_y \le \Delta_{\text{lower}}\).
–OR–
\(\phantom{22}\text{H}_{0}^{-}\text{: }Z \le \varepsilon_{\text{lower}}\), or \(Z \ge \varepsilon_{\text{upper}}\), with:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }Z \ge \varepsilon_{\text{upper}}\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }Z \le \varepsilon_{\text{lower}}\).
NOTE: the appropriate level of \(\alpha = (1 - \)conf.level\()\) is precisely the same as in the corresponding two-sided test for mean difference, so that, for example, if one wishes to make a type I error %1 of the time, one simply conducts both of the one-sided tests of \(\text{H}_{01}^{-}\) and \(\text{H}_{02}^{-}\) by comparing the resulting p-value to 0.01 (Tryon and Lewis, 2008; Wellek, 2010).
Remarks
As described by Tryon and Lewis (2008), when rejection decisions from both tests for difference (e.g., \(\text{H}_{0}^{+}\text{: }p_{x}- p_{y} = 0\) or ) and tests for equivalence (e.g., either \(\text{H}_{0}^{-}\text{: }|p_{x}- p_{y}| \ge \Delta\), or \(\text{H}_{0}^{-}\text{: }|Z| \ge \varepsilon\)) are combined, there are four possible interpretations for a given \(\alpha\) and \(\Delta\) or \(\varepsilon\):
One may reject \(\text{H}_{0}^{+}\), but fail to reject \(\text{H}_{0}^{-}\), and conclude that there is a relevant difference in proportions at least as large as \(\Delta\) or \(\varepsilon\).
One may fail to reject \(\text{H}_{0}^{+}\), but reject \(\text{H}_{0}^{-}\), and conclude that there is equivalence in proportions within the equivalence range (i.e. defined by \(\Delta\) or \(\varepsilon\)).
One may reject both \(\text{H}_{0}^{+}\) and \(\text{H}_{0}^{-}\), and conclude that there is a trivial difference in proportions which lies within the equivalence range (i.e. defined by \(\Delta\) or \(\varepsilon\)).
One may fail to reject both \(\text{H}_{0}^{+}\) and \(\text{H}_{0}^{-}\), and draw an indeterminate conclusion, because the data are underpowered to detect either difference or equivalence.
Value
tost.pr returns:
statistics |
a vector of the z statistics for the two one-sided tests; if |
p.values |
a vector of p values for the z tests. |
proportion |
a scalar estimate of the sample proportion in the one-sample test. A vector of the proportions in both groups, as well as the estimate of the proportion under the null hypothesis in the two-sample test. |
sample_size |
a scalar containing the sample size of the one-sample test. A vector of the sample size in both groups, as well as the combined sample size in the two-sample test. |
threshold |
a scalar containing the equivalence threshold when |
conclusion |
a string containing the relevance test conclusion when |
Author(s)
Alexis Dinno (alexis.dinno@pdx.edu)
Please contact me with any questions, bug reports or suggestions for improvement. Fixing bugs will be facilitated by sending along:
a copy of the data (de-labeled or anonymized is fine),
a copy of the command syntax used, and
a copy of the exact output of the command.
I am endebted to my winter 2013 and fall 2023 students for their inspiration. Much appreciation to Mick McVeety for troubleshooting the translation of my Stata tost package to R.
Suggested citation
Dinno, A. 2025. tost.pr: Mean-equivalence z tests. In: tost.suite R software package. URL: https://alexisdinno.com/Software/index.shtml#tost
References
Hauck, W. W., and Anderson, S. (1984) A new statistical procedure for testing equivalence in two-group comparative bioavailability trials. Journal of Pharmacokinetics and Pharmacodynamics. 12, 83–91.
Hauck, W. W., and Anderson, S. (1986) A comparison of large-sample confidence interval methods for the difference of two binomial probabilities. The American Statistician. 40, 318–322.
Schuirmann, D. A. (1987) A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics. 15, 657–680.
Tryon, W. W., and Lewis, C. (2008) An inferential confidence interval method of establishing statistical equivalence that corrects Tryon’s (2001) reduction factor. Psychological Methods. 13, 272–277
Tu, D. (1997) Two one-sided tests procedures in establishing therapeutic equivalence with binary clinical endpoints: Fixed sample performances and sample size determination. Journal of Statistical Computing and Simmulation. 59, 271–290.
Yates, F. (1934) Contingency tables involving small numbers and the \(\chi^2\) test. Supplement to the Journal of the Royal Statistical Society. 1, 217–235.
Wellek, S. (2010) Testing Statistical Hypotheses of Equivalence and Noninferiority, second edition. Chapman and Hall/CRC Press. p. 31
See Also
Examples
require("webuse")
# Setup
webuse("auto")
# One-sample proportion equivalence test with asymmetric equivalence interval
tost.pr(
auto$foreign,
p0=0.4,
eqv.type="delta",
eqv.level=.15,
upper=.2,
relevance=FALSE)
# Setup
webuse("cure")
# Two-sample proportion relevance test; equivalence interval is +/- 1 sd
# beyond the critical value of Z for alpha = 0.05
tost.pr(
x=cure$cure1,
y=cure$cure2,
eqv.type="epsilon",
eqv.level=qnorm(.95)+1,
conf.level=0.95,
relevance=TRUE)
# Setup
data("canada")
# Two-group proportion equivalence test from Tu 1997, p 276, and incorporating
# a Hauck and Anderson continuity correction from that same example.
tost.pr(
x=canada$drug,
by=canada$group,
eqv.type="delta",
eqv.level=.2,
ccontinuity="ha",
conf.level=0.95,
relevance=FALSE)
Immediate one- and two-sample z tests for proportion equivalence
Description
Immediately performs two one-sided z tests for proportion equivalence
Usage
tost.pri(
n1 = NA, obs1 = NA, n2 = NA, obs2 = NA, count = FALSE,
eqv.type = equivalence.types,
eqv.level = 1,
upper = NA,
ccontinuity = continuity.correction.methods,
conf.level = 0.95,
x.name = "x",
y.name = "y",
relevance = TRUE)
equivalence.types
#c("delta", "epsilon")
continuity.correction.methods
#c
Arguments
n1 |
required group 1 sample size. |
obs1 |
required group 1 sample proportion if |
n2 |
an optional group 2 sample size. If |
obs2 |
required true proportion (\(p_0\)) for the one-sample test when |
count |
optionally indicates whether |
eqv.type |
defines whether the equivalence interval will be defined in terms of \(\Delta\) or \(\varepsilon\) ( |
eqv.level |
defines the equivalence threshold for the tests depending on whether |
upper |
defines the upper equivalence threshold for the test, is assumed to be positive, and transforms the meaning of |
conf.level |
confidence level of the interval, and complement of the test's nominal type I error rate \(\alpha\). |
x.name |
specifies how the first group will be labeled in the output. The default value of |
y.name |
specifies how the second group will be labeled in the output. The default value of |
ccontinuity |
calculates test statistics for both positivist and negativist tests using a continuity correction. The default is |
relevance |
reports results and inference for combined tests for difference and for equivalence for a specific |
Details
Immediate commands perfom tests given summary statistics, rather than given data. tost.pri tests for the equivalence of proportions within a symmetric equivalence interval defined by eqvtype and eqvlevel (or within an asymmetric interval when adding the upper argument) using a two one-sided z tests (TOST) approach (Schuirmann, 1987). Typically "positivist" null hypotheses are framed from an assumption of a lack of difference between two quantities, and reject this assumption only with sufficient evidence. When performing tests for equivalence, one frames a null hypothesis with the assumption that two quantities are different within an equivalence interval defined by some chosen level of tolerance.
With respect to an unpaired z test, an equivalence null hypothesis takes one of the following two forms depending on whether equivalence is defined in terms of \(\Delta\) (equivalence expressed in the same units as the proportions of the two variables) or in terms of \(\varepsilon\) (equivalence expressed in the units of the z distribution with the given degrees of freedom):
\(\phantom{22}\text{H}_{0}^{-}\text{: }|p_{x} - p_y| \ge \Delta\),
\(\phantom{22}\)where the equivalence interval ranges from \(\left(p_x - p_y\right) - \Delta\) to \(\left(p_x - p_y\right) + \Delta\). This translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{ H}_{01}^{-}\text{: }p_{x} - p_y \ge \Delta\), or
\(\phantom{2222}\text{ H}_{02}^{-}\text{: }p_{x} - p_y \le -\Delta\).
–OR–
\(\phantom{22}\text{H}_{0}^{-}\text{: }|Z| \ge \varepsilon ,\)
\(\phantom{22}\)where the equivalence interval ranges from \(-\varepsilon\) to \(\varepsilon\). This also translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }Z \ge \varepsilon\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }Z \le -\varepsilon\).
When an asymmetric equivalence interval is defined using the upper option the general negativist null hypothesis becomes:
\(\phantom{22}\text{H}_{0}^{-}\text{: }p_{x} - p_y \le \Delta_{\text{lower}}\), or \(p_{x} - p_y \ge \Delta_{\text{upper}}\)
\(\phantom{22}\)where the equivalence interval ranges from \(\left(p_x - p_y\right) + \Delta_{\text{lower}}\) to \(\left(p_x - p_y\right) + \Delta_{\text{upper}}\). This also translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }p_x - p_y \ge \Delta_{\text{upper}}\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }p_x - p_y \le \Delta_{\text{lower}}\).
–OR–
\(\phantom{22}\text{H}_{0}^{-}\text{: }Z \le \varepsilon_{\text{lower}}\), or \(Z \ge \varepsilon_{\text{upper}}\), with:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }Z \ge \varepsilon_{\text{upper}}\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }Z \le \varepsilon_{\text{lower}}\).
NOTE: the appropriate level of \(\alpha = (1 - \)conf.level\()\) is precisely the same as in the corresponding two-sided test for mean difference, so that, for example, if one wishes to make a type I error %1 of the time, one simply conducts both of the one-sided tests of \(\text{H}_{01}^{-}\) and \(\text{H}_{02}^{-}\) by comparing the resulting p-value to 0.01 (Tryon and Lewis, 2008; Wellek, 2010).
Remarks
As described by Tryon and Lewis (2008), when rejection decisions from both tests for difference (e.g., \(\text{H}_{0}^{+}\text{: }p_{x}- p_{y} = 0\) or ) and tests for equivalence (e.g., either \(\text{H}_{0}^{-}\text{: }|p_{x}- p_{y}| \ge \Delta\), or \(\text{H}_{0}^{-}\text{: }|Z| \ge \varepsilon\)) are combined, there are four possible interpretations for a given \(\alpha\) and \(\Delta\) or \(\varepsilon\):
One may reject \(\text{H}_{0}^{+}\), but fail to reject \(\text{H}_{0}^{-}\), and conclude that there is a relevant difference in proportions at least as large as \(\Delta\) or \(\varepsilon\).
One may fail to reject \(\text{H}_{0}^{+}\), but reject \(\text{H}_{0}^{-}\), and conclude that there is equivalence in proportions within the equivalence range (i.e. defined by \(\Delta\) or \(\varepsilon\)).
One may reject both \(\text{H}_{0}^{+}\) and \(\text{H}_{0}^{-}\), and conclude that there is a trivial difference in proportions which lies within the equivalence range (i.e. defined by \(\Delta\) or \(\varepsilon\)).
One may fail to reject both \(\text{H}_{0}^{+}\) and \(\text{H}_{0}^{-}\), and draw an indeterminate conclusion, because the data are underpowered to detect either difference or equivalence.
Value
tost.pri returns:
statistics |
a vector of the z statistics for the two one-sided tests; if |
p.values |
a vector of p values for the z tests. |
proportion |
a scalar estimate of the sample proportion in the one-sample test. A vector of the proportions in both groups, as well as the estimate of the proportion under the null hypothesis in the two-sample test. |
sample_size |
a scalar containing the sample size of the one-sample test. A vector of the sample size in both groups, as well as the combined sample size in the two-sample test. |
threshold |
a scalar containing the equivalence threshold when |
conclusion |
a string containing the relevance test conclusion when |
Author(s)
Alexis Dinno (alexis.dinno@pdx.edu)
Please contact me with any questions, bug reports or suggestions for improvement. Fixing bugs will be facilitated by sending along:
a copy of the data (de-labeled or anonymized is fine),
a copy of the command syntax used, and
a copy of the exact output of the command.
I am endebted to my winter 2013 and fall 2023 students for their inspiration. Much appreciation to Mick McVeety for troubleshooting the translation of my Stata tost package to R.
Suggested citation
Dinno, A. 2025. tost.pri: Mean-equivalence z tests. In: tost.suite R software package.
References
Hauck, W. W. and S. Anderson. (1984) A new statistical procedure for testing equivalence in two-group comparative bioavailability trials. Journal of Pharmacokinetics and Pharmacodynamics. 12, 83–91.
Hauck, W. W. and Anderson, S. (1986) A comparison of large-sample confidence interval methods for the difference of two binomial probabilities. The American Statistician. 40, 318–322.
Schuirmann, D. A. (1987) A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics. 15, 657–680.
Tryon, W. W., and C. Lewis. (2008) An inferential confidence interval method of establishing statistical equivalence that corrects Tryon's (2001) reduction factor. Psychological Methods. 13, 272–277
Yates, F. (1934) Contingency tables involving small numbers and the \(\chi^2\) test. Supplement to the Journal of the Royal Statistical Society. 1, 217–235.
Wellek, S. (2010) Testing Statistical Hypotheses of Equivalence and Noninferiority, second edition. Chapman and Hall/CRC Press. p. 31
See Also
Examples
# Immediate form of one-sample z test for proportion equivalence
# Note warning about value of Delta!
tost.pri(
n1=50,
obs1=.52,
obs2=.70,
eqv.type="delta",
eqv.level=.1,
relevance=FALSE)
# First two numbers are counts; equivalence interval is +/- 1 sd
# beyond the critical value of Z for alpha = 0.05
tost.pri(
n1=30,
obs1=4,
obs2=.70,
eqv.type="epsilon",
eqv.level=qnorm(.95)+1,
count=TRUE,
conf.level=0.95,
relevance=TRUE)
# Immediate form of two-sample z test for proportion equivalence using an
# example from Tu 1997, p 276, and incorporating the Hauck and Anderson
# continuity correction from that same example.
tost.pri(
n1=101,
obs1=.40594059,
n2=100,
obs2=.49,
eqv.type="delta",
eqv.level=.2,
ccontinuity="ha",
relevance=FALSE)
# The same example, but all numbers are counts
tost.pri(
n1=101,
obs1=41,
n2=100,
obs2=49,
eqv.type="delta",
eqv.level=.2,
count=TRUE,
ccontinuity="ha",
relevance=FALSE)
Two-sample rank sum test for stochastic equivalence
Description
Performs two one-sided approximate z tests for stochastic equivalence between two independent samples.
Usage
tost.rank.sum(
x, by,
eqv.type = equivalence.types,
eqv.level = 1,
upper = NA,
conf.level = 0.95,
x.name = "",
by.name = "",
by.values = NULL,
ccontinuity = FALSE,
relevance = TRUE)
equivalence.types
#c("delta", "epsilon")
Arguments
x |
a numeric vector of data values. |
by |
a numeric or factor vector of exactly two values indicating group membership. |
eqv.type |
defines whether the equivalence interval will be defined in terms of \(\varepsilon\) or \(\Delta\) ( |
eqv.level |
defines the equivalence threshold for the tests depending on whether |
upper |
defines the upper equivalence threshold for the test, is assumed to be positive, and transforms the meaning of |
conf.level |
confidence level of the interval, and complement of the test's nominal type I error rate \(\alpha\). |
x.name |
specifies how the outcome variable will be labeled in the output. The default value of |
by.name |
specifies how the grouping variable will be labeled in the output. The default value of |
by.values |
a string vector of exact two values specifying how group names will be labeled in the output. The default value of |
ccontinuity |
calculates test statistics for both positivist and negativist tests using a continuity correction. For the positivist test the approximate statistic \(z = \tfrac{\text{sgn}(W)\times(|W-\mu_{W}|-0.5)}{\sigma_{W}}\). |
relevance |
reports results and inference for combined tests for difference and for equivalence for a specific |
Details
tost.rank.sum tests the null hypothesis that the paired differences in measures are not symmetrically distributed and/or are not centered on the value of zero, and provides evidence for the distribution paired differences being equivalence to one that is symmetric and centered on zero. tost.rank.sum uses the z approximation to the rank sum test (Wilcoxon, 1945; Mann and Whitney, 1947) in a two one-sided tests approach (Schuirmann, 1987).
With respect to the rank sum test, a negativist null hypothesis takes one of the following two forms depending on whether tolerance is defined in terms of \(\Delta\) (equivalence expressed in units of rank sums) or in terms of \(\varepsilon\) (equivalence expressed in the units of the z distribution):
\(\phantom{22}\text{H}_{0}^{-}\text{: }|W - \mu_W| \ge \Delta\),
\(\phantom{22}\)where the equivalence interval ranges from \(\left(W - \mu_W\right) - \Delta\) to \(\left(W - \mu_W\right) + \Delta\) This translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{ H}_{01}^{-}\text{: }W - \mu_W \ge \Delta\), or
\(\phantom{2222}\text{ H}_{02}^{-}\text{: }W - \mu_W \le -\Delta\).
–OR–
\(\phantom{22}\text{H}_{0}^{-}\text{: }|Z| \ge \varepsilon ,\)
\(\phantom{22}\)where the equivalence interval ranges from \(-\varepsilon\) to \(\varepsilon\). This also translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }Z \ge \varepsilon\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }Z \le -\varepsilon\).
When an asymmetric equivalence interval is defined using the upper option the general negativist null hypothesis becomes:
\(\phantom{22}\text{H}_{0}^{-}\text{: }W - \mu_W \le \Delta_{\text{l}}\), or \(W - \mu_W \ge \Delta_{\text{u}}\)
\(\phantom{22}\)where the equivalence interval ranges from \(\left(W - \mu_W\right) + \Delta_{\text{l}}\) to \(\left(W - \mu_W\right) + \Delta_{\text{u}}\). This also translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }W - \mu_W \ge \Delta_{\text{u}}\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }W - \mu_W \le \Delta_{\text{l}}\).
–OR–
\(\phantom{22}\text{H}_{0}^{-}\text{: }Z \le \varepsilon_{\text{l}}\), or \(Z \ge \varepsilon_{\text{u}}\), with:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }Z \ge \varepsilon_{\text{u}}\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }Z \le \varepsilon_{\text{l}}\).
NOTE: the appropriate level of \(\alpha = (1 - \)conf.level\()\) is precisely the same as in the corresponding two-sided test for mean difference, so that, for example, if one wishes to make a type I error %1 of the time, one simply conducts both of the one-sided tests of \(\text{H}_{01}^{-}\) and \(\text{H}_{02}^{-}\) by comparing the resulting p-value to 0.01 (Wellek, 2010).
Remarks
Following Tryon and Lewis (2008), when rejection decisions from both tests for difference (e.g., \(\text{H}_{0}^{+}\text{: }W - \mu_W = 0\) or ) and tests for equivalence (e.g., either \(\text{H}_{0}^{-}\text{: }|W- \mu_W| \ge \Delta\), or \(\text{H}_{0}^{-}\text{: }|Z| \ge \varepsilon\)) are combined, there are four possible interpretations for a given \(\alpha\) and \(\Delta\) or \(\varepsilon\):
One may reject \(\text{H}_{0}^{+}\), but fail to reject \(\text{H}_{0}^{-}\), and conclude that there is relevant \(\boldsymbol{0}^{\textbf{th}}\)-order stochastic dominance between the first and second groups which is at least as large as \(\varepsilon\) or \(\Delta\).
One may fail to reject \(\text{H}_{0}^{+}\), but reject \(\text{H}_{0}^{-}\), and conclude that there is \(\boldsymbol{0}^{\textbf{th}}\)-order stochastic equivalence between the first and second groups within the equivalence range (i.e. defined by \(\varepsilon\) or \(\Delta\)).
One may reject both \(\text{H}_{0}^{+}\) and \(\text{H}_{0}^{-}\), and conclude that there is a trivial \(\boldsymbol{0}^{\textbf{th}}\)-order stochastic dominance between the first and second groups which lies within the equivalence range (i.e. defined by \(\varepsilon\) or \(\Delta\)).
One may fail to reject both \(\text{H}_{0}^{+}\) and \(\text{H}_{0}^{-}\), and draw an indeterminate conclusion, because the data are underpowered to detect either \(0^{\text{0th}}\)-order stochastic dominance or equivalence.
Value
tost.rank.sum returns:
statistics |
a vector of the z statistics for the two one-sided tests; if |
p.values |
a vector of p values for the z tests. |
rank_sums |
a vector containing the rank sums in each group, and the rank sum expected under the positivist null hypothesis. |
sample_sizes |
a vector containing the sample sizes in both groups, as well as the combined sample size of both groups. |
var_adj |
a scalar containing the adjusted variance under the postivist null hypothesis. |
threshold |
a scalar containing the equivalence threshold when |
conclusion |
a string containing the relevance test conclusion when |
Author(s)
Alexis Dinno (alexis.dinno@pdx.edu)
Please contact me with any questions, bug reports or suggestions for improvement. Fixing bugs will be facilitated by sending along:
a copy of the data (de-labeled or anonymized is fine),
a copy of the command syntax used, and
a copy of the exact output of the command.
I am endebted to my winter 2013 and fall 2023 students for their inspiration. Much appreciation to Mick McVeety for troubleshooting the translation of my Stata tost package to R.
Suggested citation
Dinno, A. 2025. tost.rank.sum: Equivalence signed rank tests. In: tost.suite R software package.
References
Mann, H. B., and D. R. Whitney. (1947) On a test whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics 18, 50–60.
Schuirmann, D. A. (1987) A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics. 15, 657–680.
Snedecor, G. W., and W. G. Cochran. (1989) Statistical Methods". 8th ed. Ames, IA: Iowa State University Press.
Tryon, W. W., and C. Lewis. (2008) An inferential confidence interval method of establishing statistical equivalence that corrects Tryon's (2001) reduction factor. Psychological Methods. 13, 272–277.
Wellek, S. (2010) Testing Statistical Hypotheses of Equivalence and Noninferiority, Second edition. Chapman and Hall/CRC Press. p. 31.
Wilcoxon, F. (1945) Individual comparisons by ranking methods. Biometrics Bulletin. 1, 80–83.
See Also
tost.sign.rank, wilcox.test, Wilcoxon.
Examples
require("webuse")
# Setup
webuse("fuel2")
# Perform two-sample rank-sum relevance test on mpg by using the two
# groups defined by treat; equivalence interval is +/- 1 sd beyond the
# critical value of Z for alpha = 0.1.
tost.rank.sum(
x=fuel2$mpg,
by=fuel2$treat,
eqv.type="epsilon",
eqv.level=qnorm(.9)+1,
conf.level=.9,
relevance=TRUE)
# Perform asymmetric rank-sum relevance test on mpg by using the two
# two groups defined by treat, and add a continuity correction.
# The lower end of the equivalence interval = qnorm(.9)+1=2.281552
# meaning equivalence must lay no more than 1 sd beyond the critical value
# of Z for alpha = 0.1. The upper end of the equivalence interval
# = qnorm(.9)+1.5 = 1.781552 meaning equivalence must lay no more than
# 0.5 sd beyond the critical value of Z for alpha = 0.1.
tost.rank.sum(
x=fuel2$mpg,
by=fuel2$treat,
eqv.type="epsilon",
eqv.level=qnorm(.9)+1,
upper=qnorm(.9)+.5,
conf.level=.9,
ccontinuity=TRUE,
relevance=TRUE)
Linear regression tests for equivalence
Description
Performs linear regression tests for equivalencee
Usage
tost.regress(
formula,
data = NULL,
eqv.type = equivalence.types,
eqv.level = 1,
upper = NA,
conf.level = 0.95,
relevance = TRUE)
equivalence.types
#c("delta", "epsilon")
Arguments
formula |
an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under 'Details'. |
data |
an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from |
eqv.type |
either a single string ( |
eqv.level |
either a single numerical value, or a vector of numerical values—one for each regression coefficient estimated—defines the equivalence threshold for the tests depending on whether |
upper |
either a single numerical value, or a vector of numerical values—one for each regression coefficient estimated—which defines the upper equivalence threshold for a coefficient's equivalence interval; is assumed to be positive, and transforms the meaning of |
conf.level |
confidence level of the interval, and complement of the test's nominal type I error rate \(\alpha\). |
relevance |
reports results and inference for combined tests for difference and for equivalence for a specific |
Details
tost.regress tests for the equivalence of each regression coefficient and zero within separate symmetric equivalence intervals defined by eqv.type and eqv.level for using a two one-sided t tests approach (Schuirmann, 1987). Typically (‘positivist’) null hypotheses are framed from an assumption of a lack of difference between two quantities, and reject this assumption only with sufficient evidence. When performing tests for equivalence, one frames a (‘negativist’) null hypothesis with the assumption that two quantities are different by at least as much as an equivalence interval defined by some chosen level of tolerance. Note: This version of tost.regress does not yet implement survey regression, bootstrap or jacknife estimation, or regression with robust or cluster standard errors, and currently implements only the simplest OLS functionality found in the Stata program tostregress.
An equivalence null hypothesis takes one of the following two forms depending on whether equivalence is defined in terms of \(\Delta\) (equivalence expressed in the same units as the x and y varibales) or in terms of \(\epsilon\) (equivalence expressed in the units of the t distribution with the given degrees of freedom):
\(\phantom{22}\text{H}_{0}^{-}\text{: }|\beta_{x}| \ge \Delta\),
\(\phantom{22}\)where the equivalence interval ranges from \(\beta_x - \Delta\) to \(\beta_x + \Delta\) This translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{ H}_{01}^{-}\text{: }\beta_{x} \ge \Delta\), or
\(\phantom{2222}\text{ H}_{02}^{-}\text{: }\beta_{x} \le -\Delta\).
–OR–
\(\phantom{22}\text{H}_{0}^{-}\text{: }|T| \ge \varepsilon ,\)
\(\phantom{22}\)where the equivalence interval ranges from \(-\varepsilon\) to \(\varepsilon\). This also translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }T \ge \varepsilon\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }T \le -\varepsilon\).
When an asymmetric equivalence interval is defined using the upper option the general negativist null hypothesis becomes:
\(\phantom{22}\text{H}_{0}^{-}\text{: }\beta_{x} \le \Delta_{\text{lower}}\), or \(\beta_{x} \ge \Delta_{\text{upper}}\)
\(\phantom{22}\)where the equivalence interval ranges from \(\left(\beta_{x}\right) + \Delta_{\text{lower}}\) to \(\left(\beta_{x}\right) + \Delta_{\text{upper}}\). This also translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }\beta_{x} \ge \Delta_{\text{upper}}\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }\beta_{x} \le \Delta_{\text{lower}}\).
–OR–
\(\phantom{22}\text{H}_{0}^{-}\text{: }T \le \varepsilon_{\text{lower}}\), or \(T \ge \varepsilon_{\text{upper}}\), with:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }T \ge \varepsilon_{\text{upper}}\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }T \le \varepsilon_{\text{lower}}\).
NOTE: the appropriate level of \(\alpha = (1 - \)conf.level\()\) is precisely the same as in the corresponding two-sided test for mean difference, so that, for example, if one wishes to make a type I error %1 of the time, one simply conducts both of the one-sided tests of \(\text{H}_{01}^{-}\) and \(\text{H}_{02}^{-}\) by comparing the resulting p-value to 0.01 (Tryon and Lewis, 2008; Wellek, 2010).
Remarks
As described by Tryon and Lewis (2008), when rejection decisions from both tests for difference (e.g., \(\text{H}_{0}^{+}\text{: }\beta_{x} = 0\) or ) and tests for equivalence (e.g., either \(\text{H}_{0}^{-}\text{: }|\beta_{x}| \ge \Delta\), or \(\text{H}_{0}^{-}\text{: }|T| \ge \varepsilon\)) are combined, there are four possible interpretations for a given \(\alpha\) and \(\Delta\) or \(\varepsilon\):
One may reject \(\text{H}_{0}^{+}\), but fail to reject \(\text{H}_{0}^{-}\), and conclude that there is a relevant difference in means at least as large as \(\Delta\) or \(\varepsilon\).
One may fail to reject \(\text{H}_{0}^{+}\), but reject \(\text{H}_{0}^{-}\), and conclude that there is equivalence in means within the equivalence range (i.e. defined by \(\Delta\) or \(\varepsilon\)).
One may reject both \(\text{H}_{0}^{+}\) and \(\text{H}_{0}^{-}\), and conclude that there is a trivial difference in means which lies within the equivalence range (i.e. defined by \(\Delta\) or \(\varepsilon\)).
One may fail to reject both \(\text{H}_{0}^{+}\) and \(\text{H}_{0}^{-}\), and draw an indeterminate conclusion, because the data are underpowered to detect either difference or equivalence.
Value
tost.regress returns:
N |
the sample size. |
df_m |
the model degrees of freedom. |
df_r |
the residual degrees of freedom. |
F |
the F statistic. |
r2 |
\(R^2\). |
rmse |
root mean squared error. |
mss |
model sum of squares. |
rss |
residual sum of squares. |
r2_a |
adjusted \(R^2\). |
alpha |
1 - |
T1 |
vector containing the value of the \(t_1\) test statistics. |
T2 |
vector containing the value of the \(t_2\) test statistics. |
T_pos |
if |
P1 |
vector of p values corresponding to the test statistics in |
P2 |
vector of p values corresponding to the test statistics in |
P_pos |
if |
SE |
vector of estimated standard deviations of the regression coefficients corresponding to |
V |
variance-covariance matrix corresponding to |
Beta |
vector of standardized regression coefficients corresponding to |
thresholds_lower |
vector containing the lower equivalence thresholds. |
thresholds_upper |
vector containing the upper equivalence thresholds. |
conclusions |
if |
Author(s)
Alexis Dinno (alexis.dinno@pdx.edu)
Please contact me with any questions, bug reports or suggestions for improvement. Fixing bugs will be facilitated by sending along:
a copy of the data (de-labeled or anonymized is fine),
a copy of the command syntax used, and
a copy of the exact output of the command.
I am endebted to my winter 2013 and fall 2023 students for their inspiration. Much appreciation to Mick McVeety for troubleshooting the translation of my Stata tost package to R.
Suggested citation
Dinno, A. 2025. tost.regress: Linear regression tests for equivalence. In: tost.suite R software package.
References
Schuirmann, D. A. (1987) A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics. 15, 657–680.
Tryon, W. W., and C. Lewis. (2008) An inferential confidence interval method of establishing statistical equivalence that corrects Tryon's (2001) reduction factor. Psychological Methods. 13, 272–277
Wellek, S (2010) Testing Statistical Hypotheses of Equivalence and Noninferiority, second edition. Chapman and Hall/CRC Press. p. 31.
See Also
lm.
Examples
require("webuse")
# Setup
webuse("auto")
# Report equivalence tests for a linear regression; equivalence interval is
# +/- 1 sd beyond the critical value of T for alpha = 0.05 and df = 71, and
# where sd = sqrt(df/(df-2)).
tost.regress(
auto$mpg ~ auto$weight + auto$foreign,
eqv.type="epsilon",
eqv.level=qt(.95, df=71)+1*sqrt(71/(71-2)),
conf.level=0.95,
relevance=FALSE)
# Report relevance tests for a linear regression; equivalence interval is
# +/- 1 sd beyond the critical value of T for alpha = 0.05 and df = 71.
tost.regress(
auto$mpg ~ auto$weight + auto$foreign,
eqv.type="epsilon",
eqv.level=qt(.95, df=71)+1*sqrt(71/(71-2)),
conf.level=0.95,
relevance=TRUE)
# Setup
webuse("auto")
auto["gp100m"] <- 100/auto$mpg
# Fit a better linear regression, from a physics standpoint, but add
# asymmetric intervals, and report relevance test results. The lower end of
# the equivalence interval = qt(.95, 71)+1.5*sqrt(71/(71-2)) = 3.188184 meaning
# eequivalence must lay no more than 1.5 sd beyond the critical value of T for
# alpha = 0.05 and df = 71. The upper end of the equivalence interval =
# qt(.95, 71)+1*sqrt(71/(71-2)) = 2.680989 meaning equivalence must lay no more
# than 1 sd beyond the critical value of T for alpha = 0.05 and df = 71, and
# where sd = sqrt(df/(df-2)).gp100m <- 100/auto$mpg
tost.regress(
auto$gp100m ~ auto$weight + auto$foreign,
eqv.type="epsilon",
eqv.level=qt(.95, df=71)+1.5*sqrt(71/(71-2)),
upper=qt(.95, df=71)+1*sqrt(71/(71-2)),
conf.level=0.95,
relevance=TRUE)
# Obtain standardized regression coefficients from the above model
tost.regress(
auto$gp100m ~ auto$weight + auto$foreign,
eqv.type="epsilon",
eqv.level=qt(.95, df=71)+1.5*sqrt(71/(71-2)),
upper=qt(.95, df=71)+1*sqrt(71/(71-2)),
conf.level=0.95,
relevance=TRUE)$Beta
# Report equivalence tests when suppressing the intercept term
tost.regress(
auto$weight ~ 0 + auto$length,
eqv.type="delta",
eqv.level=5,
conf.level=0.95,
relevance=FALSE)
# Report equivalence tests when the model already has constant; express
# equivalence interval in units of the variable only for length, and in units
# of the test statistic for each level of foreign. For the latter, the
# equivalence interval is +/- 1 sd beyond the critical value of T for
# alpha = 0.05.
tost.regress(
auto$weight ~ 0 + auto$length + as.factor(auto$foreign),
eqv.type=c("delta", "epsilon", "epsilon"),
eqv.level=c(5, qt(.95, 71)+1*sqrt(71/(71-2)), qt(.95, 71)+1*sqrt(71/(71-2))),
conf.level=0.95,
relevance=FALSE)
Test for equivalence of relative risk and unity in paired binary data
Description
Performs two one-sided z tests for equivalence of marginal probabilities in binary data following Tang, Tang, and Chan, 2003
Usage
tost.rrp(
x=NA, y=NA,
delta0 = 1,
deltaupper = NA,
exact.chisq = FALSE,
conf.level = 0.95,
treatment1 = "",
treatment2 = "",
outcome = "",
nooutcome = "",
relevance = TRUE)
Arguments
x |
a (non-empty) vector of binary data values of equal length to |
y |
a (non-empty) vector of binary data values of equal length to |
delta0 |
a required real value between 0 and 1 defining the lower threshold of an equivalence interval around RR=1. The upper boundary is |
deltaupper |
an optional value greater than 1 which is other than |
exact.chisq |
indicates that Fisher’s exact p-value will be used for the positivist test (i.e. for McNemar's \(\chi^2\) test). This probability is calculated as \(2\sum_{i=0}^{\min{b,c}}\text{Binomial}\left(n=b+c, k=i, p=0.5\right)\). |
conf.level |
confidence level of the interval, and complement of the test's nominal type I error rate \(\alpha\). |
treatment1 |
an optional string to label the first treatment group in the output (e.g., "Treated"). If unspecified, |
treatment2 |
an optional string to label the second treatment group in the output (e.g., "Untreated"). If unspecified, |
outcome |
an optional string to label those with the outcome (e.g., "Cases"). If unspecified |
nooutcome |
an optional string to label those without the outcome (e.g., "Not cases"). If unspecified |
relevance |
reports results and inference for combined tests for difference and for equivalence for a specific |
Details
tost.rrp tests for equivalence of the relative risk of a positive outcome and unity in paired (or matched) randomized control trial or paired (or matched) cohort design data. It calculates an
asymptotic z test statistic based on a reparameterized multinomial model (Tang, et al., 2003) in a two one-sided tests approach (Schuirmann, 1987). The equivalence interval for the test is defined by a chosen level of tolerance, as specified by delta0.
The two one-sided null hypotheses take on the following form based on the relative risk (RR), and the threshold delta0:
\(\phantom{22}\text{H}_{01}^{-}\text{: RR} \le \delta_0\text{, or}\)
\(\phantom{2222}\text{H}_{02}^{-}\text{: RR} \ge \frac{1}{\delta_0}\text{.}\)
\(\phantom{2222}\)where the equivalence interval ranges from \(\delta_0\) to \(\frac{1}{\delta_0}\).
When a geometrically asymmetric equivalence interval is defined using the deltaupper option the two one-sided null hypotheses become:
\(\phantom{22}\text{H}_{01}^{-}\text{: RR} \le \delta_0\text{, or}\)
\(\phantom{2222}\text{H}_{02}^{-}\text{: RR} \ge \delta_{\text{upper}}\text{.}\)
where the equivalence interval ranges from \(\delta_0\) to \(\delta_{\text{upper}}\).
The two z test statistics, \(z_1\) and \(z_2\), are both constructed with rejection probabilities in the upper tails. So \(p_1 = P(Z\ge z_1)\), and \(p_2 = P(Z\ge z_2)\).
NOTES: When \(\delta_0 = 1\), the Tang-Tang-Chan test statistic reduces to McNemar's \(\chi^2\) test statistic (McNemar, 1947). When \(a = b = c = 0\), there are no positve outcomes in either treatment group, and the RR and test statistics become undefined. If \(a > 0\), and \(b = c = 0\), then there is complete concordance, and \(z_1 = z_2\), so \(p_1 = p_2\). As is standard with two one-sided tests for equivalence, if one wishes to make a type I error %5 of the time, one simply conducts both of the one-sided tests of \(\text{H}_{01}^{-}\) and \(\text{H}_{02}^{-}\) by comparing the resulting p-values to 0.05 (Wellek, 2010).
Remarks
As described by Tryon and Lewis (2008), when rejection decisions from both tests for difference (i.e. \(\text{H}_{0}^{+}\text{: RR}= 1\)) and tests for equivalence (i.e. \(\text{H}_{01}^{-}\text{: RR} \le \delta_{0}\), or \(\text{H}_{02}^{-}\text{: RR} \ge \frac{1}{\delta_0}\)) are combined, there are four possible interpretations for a given \(\alpha\) and \(\delta_0\):
One may reject \(\text{H}_{0}^{+}\), but fail to reject both \(\text{H}_{01}^{-}\text{ and }\text{H}_{02}^{-}\), and conclude that there is a relevant difference between RR and 1 at least as large as the interval defined by \(\delta_0\).
One may fail to reject \(\text{H}_{0}^{+}\), but reject both \(\text{H}_{01}^{-}\text{ and }\text{H}_{02}^{-}\), and conclude that there is equivalence between RR and 1 within the interval defined by \(\delta_0\).
One may reject \(\text{H}_{0}^{+}\) and reject both \(\text{H}_{01}^{-}\text{ and }\text{H}_{02}^{-}\), and conclude that there is a trivial difference between RR and 1 which lies within the interval defined by \(\delta_0\).
One may fail to reject \(\text{H}_{0}^{+}\) and fail to reject both \(\text{H}_{01}^{-}\text{ and }\text{H}_{02}^{-}\), and draw an indeterminate conclusion, because the data are underpowered to detect either difference or equivalence.
Value
tost.rrp returns:
statistics |
a vector containing the value of \(z_{1}\) and \(z_{2}\); if |
p.values |
a vector of p values for the z tests, and, if |
estimate |
the estimated relative risk (aka incidence rate ratio) of positive outcome for treatment 2 vs. treatment 1. |
error |
the estimated standard deviation of relative risk based on the score statistic per (Tang, et al., 2003). |
threshold |
a scalar (\(\delta_0\)) containing the equivalence threshold when |
conclusion |
relevance test conclusion for a given \(\alpha\) and \(\delta_0\), or \(\delta_l\) and \(\delta_u\). |
Author(s)
Alexis Dinno (alexis.dinno@pdx.edu)
Please contact me with any questions, bug reports or suggestions for improvement. Fixing bugs will be facilitated by sending along:
a copy of the data (de-labeled or anonymized is fine),
a copy of the command syntax used, and
a copy of the exact output of the command.
Suggested citation
Dinno, A. 2025. tost.rrp: Test for equivalence of relative risk and unity in paired binary data. In: tost.suite R software package.
References
Lachenbruch, P. A. and Lynch, C. J. (1998) Assessing screening tests: Extensions of McNemar's test. Statistics In Medicine 17, 2207–2217.
McNemar, Q. (1947) Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12, 153–157
Schuirmann, D. A. (1987) A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics 15, 657–680.
Tang, N.-S., Tang, M.-L., and Chan, I. S. F. (2003) On tests of equivalence via non-unity relative risk for matched-pair design. Statistics In Medicine 22, 1217–1233.
Tryon, W. W., and C. Lewis. (2008) An inferential confidence interval method of establishing statistical equivalence that corrects Tryon's (2001) reduction factor. Psychological Methods 13, 272–277.
Wellek, S. (2010) Testing Statistical Hypotheses of Equivalence and Noninferiority, second edition. Chapman and Hall/CRC Press. p. 31
See Also
Examples
# Setup
data(hivfluid)
# Relevance test example from Tang, et al., 2003, Table II, based on data from
# Lachenbruch and Lynch, 1998 with equivalence interval .95 to 1.052632
# (1/.95 = 1.052632)
tost.rrp(
x=hivfluid$plasma,
y=hivfluid$alternate,
delta0=.95,
outcome="HIV Positive",
nooutcome="HIV Negative",
relevance=TRUE)
Immediate test for equivalence of relative risk and unity in paired binary data
Description
Immediately performs two one-sided z tests for equivalence of marginal probabilities in binary data following Tang, Tang, and Chan, 2003
Usage
tost.rrpi(
a = NA, b = NA, c = NA, n = NA,
delta0 = 1,
deltaupper = NA,
exact.chisq = FALSE,
conf.level = 0.95,
treatment1 = "",
treatment2 = "",
outcome = "",
nooutcome = "",
relevance = TRUE)
Arguments
a |
a non-negative integer indicating the number of paired observations with both first treatment and second treatment are positive for the outcome. |
b |
a non-negative integer indicating the number of paired observations with first treatment negative and second treatment positive for the outcome. |
c |
a non-negative integer indicating the number of paired observations with first treatment positive and second treatment negative for the outcome. |
n |
a non-negative integer indicating the total number of paired observations. \(n = a + b + c + d\) (\(d\), which is not directly provided, equals \(n - a - b - c\)). |
delta0 |
a required real value between 0 and 1 defining the lower threshold of an equivalence interval around RR=1. The upper boundary is |
deltaupper |
an optional value greater than 1 which is other than |
exact.chisq |
indicates that Fisher’s exact p-value will be used for the positivist test (i.e. for McNemar's \(\chi^2\) test). This probability is calculated as \(2\sum_{i=0}^{\min{b,c}}\text{Binomial}\left(n=b+c, k=i, p=0.5\right)\). |
conf.level |
confidence level of the interval, and complement of the test's nominal type I error rate \(\alpha\). |
treatment1 |
an optional string to label the first treatment group in the output (e.g., "Treated"). If unspecified, |
treatment2 |
an optional string to label the second treatment group in the output (e.g., "Untreated"). If unspecified, |
outcome |
an optional string to label those with the outcome (e.g., "Cases"). If unspecified |
nooutcome |
an optional string to label those without the outcome (e.g., "Not cases"). If unspecified |
relevance |
reports results and inference for combined tests for difference and for equivalence for a specific |
Details
Immediate commands perfom tests given summary statistics, rather than given data. tost.rrpi tests for equivalence of the relative risk of a positive outcome and unity in paired (or matched) randomized control trial or paired (or matched) cohort design data. It calculates an
asymptotic z test statistic based on a reparameterized multinomial model (Tang, et al., 2003) in a two one-sided tests approach (Schuirmann, 1987). tost.rrp is the non-immediate form of tost.rrpi. The equivalence interval for the test is defined by a chosen level of tolerance, as specified by delta0.
The two one-sided null hypotheses take on the following form based on the relative risk (RR), and the threshold delta0:
\(\phantom{22}\text{H}_{01}^{-}\text{: RR} \le \delta_0\text{, or}\)
\(\phantom{2222}\text{H}_{02}^{-}\text{: RR} \ge \frac{1}{\delta_0}\text{.}\)
\(\phantom{2222}\)where the equivalence interval ranges from \(\delta_0\) to \(\frac{1}{\delta_0}\).
When a geometrically asymmetric equivalence interval is defined using the deltaupper option the two one-sided null hypotheses become:
\(\phantom{22}\text{H}_{01}^{-}\text{: RR} \le \delta_0\text{, or}\)
\(\phantom{2222}\text{H}_{02}^{-}\text{: RR} \ge \delta_{\text{upper}}\text{.}\)
where the equivalence interval ranges from \(\delta_0\) to \(\delta_{\text{upper}}\).
The two z test statistics, \(z_1\) and \(z_2\), are both constructed with rejection probabilities in the upper tails. So \(p_1 = P(Z\ge z_1)\), and \(p_2 = P(Z\ge z_2)\).
NOTES: When \(\delta_0 = 1\), the Tang-Tang-Chan test statistic reduces to McNemar's \(\chi^2\) test statistic (McNemar, 1947). When \(a = b = c = 0\), there are no positve outcomes in either treatment group, and the RR and test statistics become undefined. If \(a > 0\), and \(b = c = 0\), then there is complete concordance, and \(z_1 = z_2\), so \(p_1 = p_2\). As is standard with two one-sided tests for equivalence, if one wishes to make a type I error %5 of the time, one simply conducts both of the one-sided tests of \(\text{H}_{01}^{-}\) and \(\text{H}_{02}^{-}\) by comparing the resulting p-values to 0.05 (Wellek, 2010).
Remarks
As described by Tryon and Lewis (2008), when rejection decisions from both tests for difference (i.e. \(\text{H}_{0}^{+}\text{: RR}= 1\)) and tests for equivalence (i.e. \(\text{H}_{01}^{-}\text{: RR} \le \delta_{0}\), or \(\text{H}_{02}^{-}\text{: RR} \ge \frac{1}{\delta_0}\)) are combined, there are four possible interpretations for a given \(\alpha\) and \(\delta_0\):
One may reject \(\text{H}_{0}^{+}\), but fail to reject both \(\text{H}_{01}^{-}\text{ and }\text{H}_{02}^{-}\), and conclude that there is a relevant difference between RR and 1 at least as large as the interval defined by \(\delta_0\).
One may fail to reject \(\text{H}_{0}^{+}\), but reject both \(\text{H}_{01}^{-}\text{ and }\text{H}_{02}^{-}\), and conclude that there is equivalence between RR and 1 within the interval defined by \(\delta_0\).
One may reject \(\text{H}_{0}^{+}\) and reject both \(\text{H}_{01}^{-}\text{ and }\text{H}_{02}^{-}\), and conclude that there is a trivial difference between RR and 1 which lies within the interval defined by \(\delta_0\).
One may fail to reject \(\text{H}_{0}^{+}\) and fail to reject both \(\text{H}_{01}^{-}\text{ and }\text{H}_{02}^{-}\), and draw an indeterminate conclusion, because the data are underpowered to detect either difference or equivalence.
Value
tost.rrpi returns:
statistics |
a vector containing the value of \(z_{1}\) and \(z_{2}\); if |
p.values |
a vector of p values for the z tests, and, if |
estimate |
the estimated relative risk (aka incidence rate ratio) of positive outcome for treatment 2 vs. treatment 1. |
error |
the estimated standard deviation of relative risk based on the score statistic per (Tang, et al., 2003). |
threshold |
a scalar (\(\delta_0\)) containing the equivalence threshold when |
conclusion |
relevance test conclusion for a given \(\alpha\) and \(\delta_0\), or \(\delta_l\) and \(\delta_u\). |
Author(s)
Alexis Dinno (alexis.dinno@pdx.edu)
Please contact me with any questions, bug reports or suggestions for improvement. Fixing bugs will be facilitated by sending along:
a copy of the data (de-labeled or anonymized is fine),
a copy of the command syntax used, and
a copy of the exact output of the command.
Suggested citation
Dinno, A. 2025. tost.rrpi: Test for equivalence of relative risk and unity in paired binary data. In: tost.suite R software package.
References
Lachenbruch, P. A. and Lynch, C. J. (1998) Assessing screening tests: Extensions of McNemar's test. Statistics In Medicine 17, 2207–2217.
McNemar, Q. (1947) Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12, 153–157
Schuirmann, D. A. (1987) A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics 15, 657–680.
Tang, N.-S., Tang, M.-L., and Chan, I. S. F. (2003) On tests of equivalence via non-unity relative risk for matched-pair design. Statistics In Medicine 22, 1217–1233.
Tryon, W. W., and C. Lewis. (2008) An inferential confidence interval method of establishing statistical equivalence that corrects Tryon's (2001) reduction factor. Psychological Methods 13, 272–277.
Tango, T. (1998) Equivalence test and confidence interval for the difference in proportions for the paired-sample design. Statistics In Medicine 17, 891–908.
Wellek, S. (2010) Testing Statistical Hypotheses of Equivalence and Noninferiority, second edition. Chapman and Hall/CRC Press. p. 31
See Also
Examples
# Same as the relevance test example from Tang, et al., 2003, Table II in
# tost.rpp, based on data from Lachenbruch and Lynch, 1998 with equivalence
# interval .95 to 1.052632, but using the immediate command.
tost.rrpi(a=446, b=5, c=16, n=1157,
delta0=.95,
treatment1="Plasma sample",
treatment2="Alternate fluid",
outcome="HIV Positive",
nooutcome="HIV Negative",
relevance=TRUE)
# Same as above, but using the exact p-value for the positivist test.
# Positivist test and relevance test conclusions change
tost.rrpi(a=446, b=5, c=16, n=1157,
delta0=.95,
treatment1="Plasma sample",
treatment2="Alternate fluid",
outcome="HIV Positive",
nooutcome="HIV Negative",
exact.chisq=TRUE,
relevance=TRUE)
# Example from Tang, et al., 2003, Table V, based on data from Tango, 1998
# Using exact.chisq=TRUE because expected counts are tiny in some cells
tost.rrpi(a=43, b=0, c=1, n=44,
delta0=.9,
treatment1="Thermal",
treatment2="Chemical",
outcome="Effective",
nooutcome="Ineffective",
exact.chisq=TRUE,
relevance=FALSE)
Test for the distribution of paired or matched data being equivalent to one that is symmetrical & centered on zero
Description
Performs two one-sided approximate z tests for equivalence between the distribution of paired differences and a distribution which is both symmetric and centered on zero.
Usage
tost.sign.rank(
x, y,
eqv.type = equivalence.types,
eqv.level = 1,
upper = NA,
ccontinuity = FALSE,
conf.level = 0.95,
x.name = "",
y.name = "",
relevance = TRUE)
equivalence.types
#c("delta", "epsilon")
Arguments
x |
a (non-empty) numeric vector of data values. |
y |
a (non-empty) numeric vector of data values. |
eqv.type |
defines whether the equivalence interval will be defined in terms of \(\varepsilon\) or \(\Delta\) ( |
eqv.level |
defines the equivalence threshold for the tests depending on whether |
upper |
defines the upper equivalence threshold for the test, is assumed to be positive, and transforms the meaning of |
ccontinuity |
calculates test statistics for both positivist and negativist tests using a continuity correction. For the positivist test the approximate statistic \(z = \tfrac{\text{sgn}(T)\times(|T-\mu_{T}|-0.5)}{\sigma_{T}}\). |
conf.level |
confidence level of the interval, and complement of the test's nominal type I error rate \(\alpha\). |
x.name |
specifies how the first variable will be labeled in the output. The default value of |
y.name |
specifies how the second variable will be labeled in the output. The default value of |
relevance |
reports results and inference for combined tests for difference and for equivalence for a specific |
Details
tost.sign.rank tests the null hypothesis that the paired differences in measures are not symmetrically distributed and/or are not centered on the value of zero, and provides evidence for the distribution paired differences being equivalence to one that is symmetric and centered on zero. tost.sign.rank uses the z approximation to the Wilcoxon matched-pairs signed-ranks test (Wilcoxon 1945) in a two one-sided tests approach (Schuirmann, 1987).
With respect to the signed-rank test, a negativist null hypothesis takes one of the following two forms depending on whether tolerance is defined in terms of \(\Delta\) (equivalence expressed in the same units as the absolute value of sums of signed ranks) or in terms of \(\varepsilon\) (equivalence expressed in the units of the z distribution):
\(\phantom{22}\text{H}_{0}^{-}\text{: }|T - \mu_T| \ge \Delta\),
\(\phantom{22}\)where the equivalence interval ranges from \(\left(T - \mu_T\right) - \Delta\) to \(\left(T - \mu_T\right) + \Delta\) This translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{ H}_{01}^{-}\text{: }T - \mu_T \ge \Delta\), or
\(\phantom{2222}\text{ H}_{02}^{-}\text{: }T - \mu_T \le -\Delta\).
–OR–
\(\phantom{22}\text{H}_{0}^{-}\text{: }|Z| \ge \varepsilon ,\)
\(\phantom{22}\)where the equivalence interval ranges from \(-\varepsilon\) to \(\varepsilon\). This also translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }Z \ge \varepsilon\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }Z \le -\varepsilon\).
When an asymmetric equivalence interval is defined using the upper option the general negativist null hypothesis becomes:
\(\phantom{22}\text{H}_{0}^{-}\text{: }T - \mu_T \le \Delta_{\text{l}}\), or \(T - \mu_T \ge \Delta_{\text{u}}\)
\(\phantom{22}\)where the equivalence interval ranges from \(\left(T - \mu_T\right) + \Delta_{\text{l}}\) to \(\left(T - \mu_T\right) + \Delta_{\text{u}}\). This also translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }T - \mu_T \ge \Delta_{\text{u}}\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }T - \mu_T \le \Delta_{\text{l}}\).
–OR–
\(\phantom{22}\text{H}_{0}^{-}\text{: }Z \le \varepsilon_{\text{l}}\), or \(Z \ge \varepsilon_{\text{u}}\), with:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }Z \ge \varepsilon_{\text{u}}\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }Z \le \varepsilon_{\text{l}}\).
NOTE: the appropriate level of \(\alpha = (1 - \)conf.level\()\) is precisely the same as in the corresponding two-sided test for mean difference, so that, for example, if one wishes to make a type I error %1 of the time, one simply conducts both of the one-sided tests of \(\text{H}_{01}^{-}\) and \(\text{H}_{02}^{-}\) by comparing the resulting p-value to 0.01 (Wellek, 2010).
Remarks
Following Tryon and Lewis (2008), when rejection decisions from both tests for difference (e.g., \(\text{H}_{0}^{+}\text{: }T- \mu_T = 0\) or ) and tests for equivalence (e.g., either \(\text{H}_{0}^{-}\text{: }|T- \mu_T| \ge \Delta\), or \(\text{H}_{0}^{-}\text{: }|Z| \ge \varepsilon\)) are combined, there are four possible interpretations for a given \(\alpha\) and \(\Delta\) or \(\varepsilon\):
One may reject \(\text{H}_{0}^{+}\), but fail to reject \(\text{H}_{0}^{-}\), and conclude that there is a relevant difference between the distribution of paired differences and a distribution which is both symmetric and centered on zero which is at least as large as \(\varepsilon\) or \(\Delta\).
One may fail to reject \(\text{H}_{0}^{+}\), but reject \(\text{H}_{0}^{-}\), and conclude that there is equivalence between the distribution of paired differences and a distribution which is both symmetric and centered on zero within the equivalence range (i.e. defined by \(\varepsilon\) or \(\Delta\)).
One may reject both \(\text{H}_{0}^{+}\) and \(\text{H}_{0}^{-}\), and conclude that there is a trivial difference between the distribution of paired differences and a distribution which is both symmetric and centered on zero which lies within the equivalence range (i.e. defined by \(\varepsilon\) or \(\Delta\)).
One may fail to reject both \(\text{H}_{0}^{+}\) and \(\text{H}_{0}^{-}\), and draw an indeterminate conclusion, because the data are underpowered to detect either difference or equivalence.
Value
tost.sign.rank returns:
statistics |
a vector of the z statistics for the two one-sided tests; if |
p.values |
a vector of p values for the z tests. |
signed_rank_sums |
a vector containing the absolute value of positive and negative rank sums, and the signed rank sum expected under the positivist null hypothesis. |
sample_size |
a scalar containing the sample size. |
counts |
a vector containing the number of negative comparisons, number of positive comparisons, and number of tied comparisons. |
var_adj |
a scalar containing the adjusted variance under the postivist null hypothesis. |
threshold |
a scalar containing the equivalence threshold when |
conclusion |
a string containing the relevance test conclusion when |
Author(s)
Alexis Dinno (alexis.dinno@pdx.edu)
Please contact me with any questions, bug reports or suggestions for improvement. Fixing bugs will be facilitated by sending along:
a copy of the data (de-labeled or anonymized is fine),
a copy of the command syntax used, and
helpa copy of the exact output of the command.
Much appreciation to Mick McVeety for troubleshooting the translation of my Stata tost package to R.
Suggested citation
Dinno, A. 2025. tost.sign.rank: Test for the distribution of paired or matched data being equivalent to one that is symmetrical & centered on zero. In: tost.suite R software package.
References
Schuirmann, D. A. (1987) A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics. 15, 657–680.
Snedecor, G. W., and W. G. Cochran. (1989) Statistical Methods. 8th ed. Ames, IA: Iowa State University Press.
Tryon, W. W., and Lewis, C. (2008) An inferential confidence interval method of establishing statistical equivalence that corrects Tryon's (2001) reduction factor. Psychological Methods. 13, 272–277.
Wellek, S. (2010) Testing Statistical Hypotheses of Equivalence and Noninferiority, Second edition. Chapman and Hall/CRC Press. p. 31.
Wilcoxon, F. (1945) Individual comparisons by ranking methods. Biometrics Bulletin. 1, 80–83.
See Also
SignRank, tost.rank.sum, wilcox.test.
Examples
require("webuse")
#Setup
webuse("fuel")
# Perform sign-rank relevance test between mpg1 and mpg2; equivalence
# interval is +/- 1.5 sd beyond the critical value of Z for alpha = 0.05.
tost.sign.rank(
fuel$mpg1,
fuel$mpg2,
eqv.type="epsilon",
eqv.level=qnorm(.95)+1.5,
relevance=TRUE)
# Same example, but using an asymmetric equivalence interval and continuity
# correction. The lower end of the equivalence interval = qnorm(.95)+1.5
# = 3.144854 meaning equivalence must lay no more than 1.5 sd beyond the
# critical value of Z for alpha = 0.05. The upper end of the equivalence
# interval = qnorm(.95)+1 = 2.644854 meaning equivalence must lay
# no more than 1 sd beyond the critical value of Z for alpha = 0.05.
tost.sign.rank(
fuel$mpg1,
fuel$mpg2,
eqv.type="epsilon",
eqv.level=qnorm(.95)+1.5,
upper=qnorm(.95)+1,
ccontinuity=TRUE,
relevance=TRUE)
Mean-equivalence t tests
Description
Performs two one-sided t tests for mean equivalence
Usage
tost.t(
x,
y = NULL,
mu = NA,
by = NULL,
eqv.type = equivalence.types,
eqv.level = 1,
upper = NA,
paired = FALSE,
var.equal = FALSE,
welch = FALSE,
conf.level = 0.95,
x.name = "",
y.name = "",
by.name = "",
by.values = NULL,
relevance = TRUE)
equivalence.types
#c("delta", "epsilon")
Arguments
x |
a (non-empty) numeric vector of data values. |
y |
an optional (non-empty) numeric vector of data values. Implies |
mu |
a number indicating the true value of the mean for a one-sample test. Implies |
by |
an optional (non-empty) vector of group indicator values. Implies |
eqv.type |
defines whether the equivalence interval will be defined in terms of \(\Delta\) or \(\varepsilon\) ( |
eqv.level |
defines the equivalence threshold for the tests depending on whether |
upper |
defines the upper equivalence threshold for the test, is assumed to be positive, and transforms the meaning of |
paired |
a logical variable indicating whether you want a paired t test. Requires |
var.equal |
a logical variable indicating whether to treat the two samples as being drawn from populations with equal variances. If |
welch |
a logical variable indicating |
conf.level |
confidence level of the interval, and complement of the test's nominal type I error rate \(\alpha\). |
x.name |
specifies how the first variable will be labeled in the output. The default value of |
y.name |
specifies how the second variable will be labeled in the output when |
by.name |
an optional string to customize the grouping variable name in the output. If |
by.values |
an optional two-element character vector of group names. If none are supplied, the names of the values of |
relevance |
reports results and inference for combined tests for difference and for equivalence for a specific |
Details
tost.t tests for the equivalence of means within a symmetric equivalence interval defined by eqv.type and eqv.level using a two one-sided t tests (TOST) approach (Schuirmann, 1987). Typically "positivist" null hypotheses are framed from an assumption of a lack of difference between two quantities, and reject this assumption only with sufficient evidence. When performing tests for equivalence, one frames a null hypothesis with the assumption that two quantities are different within an equivalence interval defined by some chosen level of tolerance.
With respect to an unpaired t test, an equivalence null hypothesis takes one of the following two forms depending on whether equivalence is defined in terms of \(\Delta\) (equivalence expressed in the same units as the x and y variables) or in terms of \(\epsilon\) (equivalence expressed in the units of the t distribution with the given degrees of freedom):
\(\phantom{22}\text{H}_{0}^{-}\text{: }|\mu_{x} - \mu_y| \ge \Delta\),
\(\phantom{22}\)where the equivalence interval ranges from \(\left(\mu_x - \mu_y\right) - \Delta\) to \(\left(\mu_x - \mu_y\right) + \Delta\) This translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{ H}_{01}^{-}\text{: }\mu_{x} - \mu_y \ge \Delta\), or
\(\phantom{2222}\text{ H}_{02}^{-}\text{: }\mu_{x} - \mu_y \le -\Delta\).
–OR–
\(\phantom{22}\text{H}_{0}^{-}\text{: }|T| \ge \varepsilon ,\)
\(\phantom{22}\)where the equivalence interval ranges from \(-\varepsilon\) to \(\varepsilon\). This also translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }T \ge \varepsilon\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }T \le -\varepsilon\).
When an asymmetric equivalence interval is defined using the upper option the general negativist null hypothesis becomes:
\(\phantom{22}\text{H}_{0}^{-}\text{: }\mu_{x} - \mu_y \le \Delta_{\text{lower}}\), or \(\mu_{x} - \mu_y \ge \Delta_{\text{upper}}\)
\(\phantom{22}\)where the equivalence interval ranges from \(\left(\mu_x - \mu_y\right) + \Delta_{\text{lower}}\) to \(\left(\mu_x - \mu_y\right) + \Delta_{\text{upper}}\). This also translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }\mu_x - \mu_y \ge \Delta_{\text{upper}}\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }\mu_x - \mu_y \le \Delta_{\text{lower}}\).
–OR–
\(\phantom{22}\text{H}_{0}^{-}\text{: }T \le \varepsilon_{\text{lower}}\), or \(T \ge \varepsilon_{\text{upper}}\), with:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }T \ge \varepsilon_{\text{upper}}\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }T \le \varepsilon_{\text{lower}}\).
NOTE: the appropriate level of \(\alpha = (1 - \)conf.level\()\) is precisely the same as in the corresponding two-sided test for mean difference, so that, for example, if one wishes to make a type I error %1 of the time, one simply conducts both of the one-sided tests of \(\text{H}_{01}^{-}\) and \(\text{H}_{02}^{-}\) by comparing the resulting p-value to 0.01 (Tryon and Lewis, 2008).
Remarks
As described by Tryon and Lewis (2008), when rejection decisions from both tests for difference (e.g., \(\text{H}_{0}^{+}\text{: }\mu_{x}- \mu_{y} = 0\) or ) and tests for equivalence (e.g., either \(\text{H}_{0}^{-}\text{: }|\mu_{x}- \mu_{y}| \ge \Delta\), or \(\text{H}_{0}^{-}\text{: }|T| \ge \varepsilon\)) are combined, there are four possible interpretations for a given \(\alpha\) and \(\Delta\) or \(\varepsilon\):
One may reject \(\text{H}_{0}^{+}\), but fail to reject \(\text{H}_{0}^{-}\), and conclude that there is a relevant difference in means at least as large as \(\Delta\) or \(\varepsilon\).
One may fail to reject \(\text{H}_{0}^{+}\), but reject \(\text{H}_{0}^{-}\), and conclude that there is equivalence in means within the equivalence range (i.e. defined by \(\Delta\) or \(\varepsilon\)).
One may reject both \(\text{H}_{0}^{+}\) and \(\text{H}_{0}^{-}\), and conclude that there is a trivial difference in means which lies within the equivalence range (i.e. defined by \(\Delta\) or \(\varepsilon\)).
One may fail to reject both \(\text{H}_{0}^{+}\) and \(\text{H}_{0}^{-}\), and draw an indeterminate conclusion, because the data are underpowered to detect either difference or equivalence.
Value
tost.t returns:
statistics |
a vector of the t statistics for the two one-sided tests; if |
p.values |
a vector of p values for the t tests. |
estimate |
a scalar or vector of the estimated mean or means, mean difference, or difference in means depending on whether it was a one-sample test, paired test, or a two-sample test. |
null.value |
the specified hypothesized value of the mean in a one-sample test, or 0 for a paired test or two-sample test. |
sterr |
the standard error used in the denominator of the t statistic. |
sd |
a vector containing the sample standard deviations of the two variables or two groups in paired and unpaired tests; not returned for one-sample tests. |
sample_size |
a scalar (one-sample test) or vector (two-sample tests) containing the number of observations in the variable(s). |
parameter |
the degrees of freedom for the t statistics. |
threshold |
the value of the equivalence/relevance threshold: if |
conclusion |
a string containing the relevance test conclusion when |
Author(s)
Alexis Dinno (alexis.dinno@pdx.edu)
Please contact me with any questions, bug reports or suggestions for improvement. Fixing bugs will be facilitated by sending along:
a copy of the data (de-labeled or anonymized is fine),
a copy of the command syntax used, and
a copy of the exact output of the command.
I am endebted to my winter 2013 and fall 2023 students for their inspiration. Much appreciation to Mick McVeety for troubleshooting the translation of my Stata tost package to R.
Suggested citation
Dinno, A. 2025. tost.t: Mean-equivalence t tests. In: tost.suite R software package.
References
Satterthwaite, F. E. (1946) An approximate distribution of estimates of variance components. Biometrics Bulletin. 2, 110–114.
Schuirmann, D. A. (1987) A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics. 15, 657–680.
Tryon, W. W., and Lewis, C. (2008) An inferential confidence interval method of establishing statistical equivalence that corrects Tryon's (2001) reduction factor. Psychological Methods. 13, 272–277
Welch, B. L. (1947) The generalization of "Student's" problem when several different population variances are involved. Biometrika. 34, 28–35.
See Also
Examples
require("webuse")
# Setup
webuse("auto")
# One-sample mean equivalence t test with asymmetric equivalence interval
tost.t(
x=auto$mpg,
mu=20,
eqv.type="delta",
eqv.level=2.5,
upper=3,
relevance=FALSE)
# Setup
webuse("fuel")
# Two-sample paired relevance t test of means; equivalence interval is
# +/- 1.5 sd beyond the critical value of T with df = 11 for alpha = 0.05
tost.t(
x=fuel$mpg1,
y=fuel$mpg2,
paired=TRUE,
eqv.type="epsilon",
eqv.level=qt(p=.95,df=11)+1.5*sqrt(11/9),
conf.level=0.95,
relevance=TRUE)
# Setup
webuse("fuel3")
# Two-group unpaired mean equivalence t test assuming equal variances
# Notice warning about value of Delta!
tost.t(
x=fuel3$mpg,
by=fuel3$treated,
eqv.type="delta",
eqv.level=1.5,
var.equal=TRUE,
relevance=FALSE)
# Same example but customizing output labels
tost.t(
x=fuel3$mpg,
by=fuel3$treated,
eqv.type="delta",
eqv.level=1.5,
var.equal=TRUE,
by.name="Fuel",
by.values=c("Treated", "Untreated"),
relevance=FALSE)
Immediate mean-equivalence t tests
Description
Immediately performs two one-sided t tests for mean equivalence
Usage
tost.ti(
n1=NA, mean1=NA, sd1=NA, mu=NA,
n2=NA, mean2=NA, sd2=NA,
eqv.type = equivalence.types,
eqv.level = 1,
upper = NA,
var.equal = FALSE,
welch = FALSE,
conf.level = 0.95,
x.name = "",
y.name = "",
relevance = TRUE)
equivalence.types
#c("delta", "epsilon")
Arguments
n1 |
a required positive integer value representing the sample size in group 1. |
mean1 |
a required real value representing the sample mean in group 1. |
sd1 |
a required non-negative real value representing the sample standard deviation (not standard error) in group 1. |
mu |
an optional real value representing the true value of the mean under the positivist null hypothesis for a one-sample test. Implies |
n2 |
an optional positive integer value representing the sample size in group 2. Implies |
mean2 |
an optional real value representing the sample mean in group 2. Implies |
sd2 |
an optional non-negative real value representing the sample standard deviation (not standard error) in group 2. Implies |
eqv.type |
defines whether the equivalence interval will be defined in terms of \(\Delta\) or \(\varepsilon\) ( |
eqv.level |
defines the equivalence threshold for the tests depending on whether |
upper |
defines the upper equivalence threshold for the test, is assumed to be positive, and transforms the meaning of |
var.equal |
a logical variable indicating whether to treat the two samples as being drawn from populations with equal variances. If |
welch |
a logical variable indicating |
conf.level |
confidence level of the interval, and complement of the test's nominal type I error rate \(\alpha\). |
x.name |
specifies how the first variable will be labeled in the output. The default value of |
y.name |
specifies how the second variable will be labeled in the output when |
relevance |
reports results and inference for combined tests for difference and for equivalence for a specific |
Details
Immediate commands perfom tests given summary statistics, rather than given data. tost.ti tests for the equivalence of means within a symmetric equivalence interval defined by eqv.type and eqv.level using a two one-sided t tests (TOST) approach (Schuirmann, 1987). Typically "positivist" null hypotheses are framed from an assumption of a lack of difference between two quantities, and reject this assumption only with sufficient evidence. When performing tests for equivalence, one frames a null hypothesis with the assumption that two quantities are different within an equivalence interval defined by some chosen level of tolerance.
With respect to an unpaired t test, an equivalence null hypothesis takes one of the following two forms depending on whether equivalence is defined in terms of \(\Delta\) (equivalence expressed in the same units as mean1 and mean2) or in terms of \(\epsilon\) (equivalence expressed in the units of the t distribution with the given degrees of freedom):
\(\phantom{22}\text{H}_{0}^{-}\text{: }|\mu_{x} - \mu_y| \ge \Delta\),
\(\phantom{22}\)where the equivalence interval ranges from \(\left(\mu_x - \mu_y\right) - \Delta\) to \(\left(\mu_x - \mu_y\right) + \Delta\) This translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{ H}_{01}^{-}\text{: }\mu_{x} - \mu_y \ge \Delta\), or
\(\phantom{2222}\text{ H}_{02}^{-}\text{: }\mu_{x} - \mu_y \le -\Delta\).
–OR–
\(\phantom{22}\text{H}_{0}^{-}\text{: }|T| \ge \varepsilon ,\)
\(\phantom{22}\)where the equivalence interval ranges from \(-\varepsilon\) to \(\varepsilon\). This also translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }T \ge \varepsilon\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }T \le -\varepsilon\).
When an asymmetric equivalence interval is defined using the upper option the general negativist null hypothesis becomes:
\(\phantom{22}\text{H}_{0}^{-}\text{: }\mu_{x} - \mu_y \le \Delta_{\text{lower}}\), or \(\mu_{x} - \mu_y \ge \Delta_{\text{upper}}\)
\(\phantom{22}\)where the equivalence interval ranges from \(\left(\mu_x - \mu_y\right) + \Delta_{\text{lower}}\) to \(\left(\mu_x - \mu_y\right) + \Delta_{\text{upper}}\). This also translates directly into two one-sided null hypotheses:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }\mu_x - \mu_y \ge \Delta_{\text{upper}}\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }\mu_x - \mu_y \le \Delta_{\text{lower}}\).
–OR–
\(\phantom{22}\text{H}_{0}^{-}\text{: }T \le \varepsilon_{\text{lower}}\), or \(T \ge \varepsilon_{\text{upper}}\), with:
\(\phantom{2222}\text{H}_{01}^{-}\text{: }T \ge \varepsilon_{\text{upper}}\); or
\(\phantom{2222}\text{H}_{02}^{-}\text{: }T \le \varepsilon_{\text{lower}}\).
NOTE: the appropriate level of \(\alpha = (1 - \)conf.level\()\) is precisely the same as in the corresponding two-sided test for mean difference, so that, for example, if one wishes to make a type I error %1 of the time, one simply conducts both of the one-sided tests of \(\text{H}_{01}^{-}\) and \(\text{H}_{02}^{-}\) by comparing the resulting p-value to 0.01 (Tryon and Lewis, 2008).
Remarks
As described by Tryon and Lewis (2008), when rejection decisions from both tests for difference (e.g., \(\text{H}_{0}^{+}\text{: }\mu_{x}- \mu_{y} = 0\) or ) and tests for equivalence (e.g., either \(\text{H}_{0}^{-}\text{: }|\mu_{x}- \mu_{y}| \ge \Delta\), or \(\text{H}_{0}^{-}\text{: }|T| \ge \varepsilon\)) are combined, there are four possible interpretations for a given \(\alpha\) and \(\Delta\) or \(\varepsilon\):
One may reject \(\text{H}_{0}^{+}\), but fail to reject \(\text{H}_{0}^{-}\), and conclude that there is a relevant difference in means at least as large as \(\Delta\) or \(\varepsilon\).
One may fail to reject \(\text{H}_{0}^{+}\), but reject \(\text{H}_{0}^{-}\), and conclude that there is equivalence in means within the equivalence range (i.e. defined by \(\Delta\) or \(\varepsilon\)).
One may reject both \(\text{H}_{0}^{+}\) and \(\text{H}_{0}^{-}\), and conclude that there is a trivial difference in means which lies within the equivalence range (i.e. defined by \(\Delta\) or \(\varepsilon\)).
One may fail to reject both \(\text{H}_{0}^{+}\) and \(\text{H}_{0}^{-}\), and draw an indeterminate conclusion, because the data are underpowered to detect either difference or equivalence.
Value
tost.ti returns:
statistics |
a vector of the t statistics for the two one-sided tests; if |
p.values |
a vector of p values for the t tests. |
estimate |
the estimated mean or means, or difference in means depending on whether it was a one-sample test, or a two-sample test. |
null.value |
the specified hypothesized value of the mean in a one-sample test, or 0 for a paired test or two-sample test. |
sterr |
the standard error used in the denominator of the t statistic. |
sd |
a vector containing the sample standard deviations of the two variables or two groups in unpaired tests; not returned for one-sample tests. |
sample_size |
a scalar (one-sample test) or vector (two-sample tests) containing the number of observations in the variable(s). |
parameter |
the degrees of freedom for the t statistics. |
threshold |
the value of the equivalence/relevance threshold: if |
conclusion |
a string containing the relevance test conclusion when |
Author(s)
Alexis Dinno (alexis.dinno@pdx.edu)
Please contact me with any questions, bug reports or suggestions for improvement. Fixing bugs will be facilitated by sending along:
a copy of the data (de-labeled or anonymized is fine),
a copy of the command syntax used, and
a copy of the exact output of the command.
I am endebted to my winter 2013 and fall 2023 students for their inspiration. Much appreciation to Mick McVeety for troubleshooting the translation of my Stata tost package to R.
Suggested citation
Dinno, A. 2025. tost.ti: Mean-equivalence t tests. In: tost.suite R software package.
References
Satterthwaite, F. E. (1946) An approximate distribution of estimates of variance components. Biometrics Bulletin. 2, 110–114.
Schuirmann, D. A. (1987) A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics. 15, 657–680.
Tryon, W. W., and Lewis, C. (2008) An inferential confidence interval method of establishing statistical equivalence that corrects Tryon's (2001) reduction factor. Psychological Methods. 13, 272–277
Welch, B. L. (1947) The generalization of "Student's" problem when several different population variances are involved. Biometrika. 34, 28–35.
See Also
Examples
# Immediate one-sample mean equivalence test
tost.ti(
n1=24,
mean1=62.6,
sd1=15.8,
mu=75,
eqv.type="delta",
eqv.level=20,
relevance=FALSE)
# Immediate two-sample relevance t test of means assuming unequal variances
# Note: n1=24 m1=62.6 sd1=15.8 n2=30 m2=76.6 sd2=16.6
# Satterthwaite's df = 50.3912, and equivalence interval is +/- 1.5 sd
# beyond the critical value of T with df = 50.3912
tost.ti(
n1=24, mean1=62.6, sd1=15.8,
n2=30, mean2=76.6, sd2=16.6,
eqv.type="epsilon",
eqv.level=qt(.95, df=50.3912)+1.5*sqrt(50.3912/(50.3912-2)),
x.name="Intervention",
y.name="Control",
conf.level=0.95,
relevance=TRUE)