This vignette documents the features added in rdrobust 4.0.0:
data = argument for rdrobust(),
rdbwselect(), and rdplot(), letting you refer
to variables by their bare names inside a data frame.vce = "hc0", "hc1", "hc2",
"hc3").vce = "cr1",
"cr2", "cr3") with automatic validation.plot method for rdrobust objects
(plot.rdrobust).hide = TRUE flag and vars_bins /
vars_poly / coef outputs from
rdplot(), so you can build custom RD plots on top of
ggplot2 (or any other graphics system) without
re-implementing the binning logic.tidy / glance methods for integration with
the broom package.Existing functionality — bandwidth selection, covariates, kernels,
fuzzy and kink designs — is unchanged. See ?rdrobust and
?rdbwselect for full documentation.
We use the U.S. Senate incumbency advantage dataset included in the
package. The running variable is the Democratic vote margin at election
\(t\); the outcome is the Democratic
vote share at election \(t+2\). The
state variable serves as a natural cluster identifier,
since elections within the same state share common political and
institutional features.
library(rdrobust)
data(rdrobust_RDsenate)
df <- rdrobust_RDsenate
head(df)
#> state year margin vote class termshouse termssenate dopen
#> 1 Connecticut 1914 -7.6885610 36.09757 3 3 6 0
#> 2 Connecticut 1916 -3.9237082 45.46875 1 0 4 0
#> 3 Connecticut 1922 -6.8686604 45.59821 1 0 7 0
#> 4 Connecticut 1926 -27.6680560 48.47606 3 0 3 0
#> 5 Connecticut 1928 -8.2569685 51.74687 1 0 1 1
#> 6 Connecticut 1932 0.7324815 39.80264 3 4 1 0
#> population presdemvoteshlag1 demvoteshlag1 demvoteshlag2 demvoteshfor1
#> 1 1233000 39.15937 NA NA 46.23941
#> 2 1294000 39.15937 42.07694 NA 36.09757
#> 3 1431000 33.02737 36.09757 46.23941 35.64121
#> 4 1531000 27.52570 45.46875 36.09757 45.59821
#> 5 1577000 27.52570 35.64121 45.46875 48.47606
#> 6 1637000 45.57480 45.59821 35.64121 51.74687
#> demwinprv1 demwinprv2 dmidterm dpresdem
#> 1 NA NA 1 1
#> 2 0 NA 0 1
#> 3 0 0 1 0
#> 4 0 0 1 0
#> 5 0 0 0 0
#> 6 0 0 0 0data = argumentPreviously, callers either had to attach() the data
frame or pass each column with df$ prefixing. In 4.0.0,
rdrobust(), rdbwselect(), and
rdplot() accept a data = argument so
y, x, covs, cluster,
fuzzy, weights, and subset can be
given as bare names, resolved against the supplied data frame:
r <- rdrobust(y = vote, x = margin, data = df)
round(r$coef, 3)
#> Coeff
#> Conventional 7.414
#> Bias-Corrected 7.507
#> Robust 7.507It also works for expressions built from columns
(e.g. cbind() for multiple covariates, comparisons for
subset):
r_cov <- rdrobust(y = vote, x = margin,
covs = cbind(class, termshouse, termssenate),
data = df)
round(r_cov$coef, 3)
#> Coeff
#> Conventional 6.850
#> Bias-Corrected 6.988
#> Robust 6.988
# subset via an expression evaluated inside `data`
r_sub <- rdrobust(y = vote, x = margin,
subset = year >= 1950, data = df)
round(r_sub$coef, 3)
#> Coeff
#> Conventional 9.223
#> Bias-Corrected 9.295
#> Robust 9.295The same pattern works for rdbwselect() and
rdplot(). For the rest of the vignette we pass variables
through data =.
covs =: formula, names, or matrixThe covs argument accepts three forms, so the way you
specify covariates can match the shape of your problem:
One-sided formula — recommended when you have factors, interactions, or transformations:
df$classf <- factor(df$class, labels = c("A", "B", "C"))
r_fm <- rdrobust(y = vote, x = margin,
covs = ~ classf + termshouse + I(termssenate^2),
data = df)
round(r_fm$coef, 3)
#> Coeff
#> Conventional 6.851
#> Bias-Corrected 6.982
#> Robust 6.982Factors are expanded to contrast dummies via
model.matrix(); interactions (~ z1 * z2) and
transformations (I(...), log(),
poly()) all work, and the intercept column is dropped
automatically. Symbols missing from data fall through to
the caller’s environment.
Character vector of column names — convenient for programmatic specifications:
covs_names <- c("class", "termshouse", "termssenate")
r_ch <- rdrobust(y = vote, x = margin, covs = covs_names, data = df)
round(r_ch$coef, 3)
#> Coeff
#> Conventional 6.850
#> Bias-Corrected 6.988
#> Robust 6.988Missing column names give a clear error
(Column(s) not found in 'data': ...). Passing a character
vector without data = is rejected up front instead of
failing deep inside the estimator.
Numeric vector, matrix, or data frame — fully backwards compatible:
All three forms — formula with the same plain terms, character vector, and pre-built matrix — yield identical results when the covariates are numeric and the expansion is trivial; the formula form is the right choice when factor expansion or interactions matter.
rdplot(y = vote, x = margin, data = df,
title = "Senate: incumbency advantage",
x.label = "Vote margin (t)",
y.label = "Vote share (t+2)")The full HC0–HC3 family is available via
vce =. HC0 is the plug-in sandwich;
HC1 adds a finite-sample scalar; HC2 and
HC3 reweight residuals by leverage, with HC3
being the most conservative.
hc_fits <- lapply(c("nn", "hc0", "hc1", "hc2", "hc3"), function(v) {
rdrobust(y = vote, x = margin, vce = v, data = df)
})
names(hc_fits) <- c("NN", "HC0", "HC1", "HC2", "HC3")
hc_tbl <- do.call(rbind, lapply(names(hc_fits), function(nm) {
r <- hc_fits[[nm]]
data.frame(vce = nm,
coef = round(r$coef[2], 3), # bias-corrected
se_rb = round(r$se[3], 3), # robust SE
pval = round(r$pv[3], 3))
}))
print(hc_tbl, row.names = FALSE)
#> vce coef se_rb pval
#> NN 7.507 1.741 0
#> HC0 7.506 1.739 0
#> HC1 7.504 1.744 0
#> HC2 7.502 1.747 0
#> HC3 7.497 1.755 0As expected, the point estimate is identical across HC variants (the variance estimator does not change it); only the standard errors and p-values move.
When observations are grouped into clusters, standard errors should account for within-cluster correlation. Version 4.0.0 adds three cluster-robust variance estimators:
vce |
Description |
|---|---|
"cr1" |
CR1 — cluster sandwich with degrees-of-freedom correction; the
default when cluster is supplied |
"cr2" |
CR2 — Bell & McCaffrey (2002); accounts for within-cluster leverage |
"cr3" |
CR3 — Pustejovsky & Tipton (2018); leave-one-cluster-out jackknife; most conservative |
r_cr1 <- rdrobust(y = vote, x = margin, cluster = state, vce = "cr1", data = df)
r_cr2 <- rdrobust(y = vote, x = margin, cluster = state, vce = "cr2", data = df)
r_cr3 <- rdrobust(y = vote, x = margin, cluster = state, vce = "cr3", data = df)
cl_tbl <- data.frame(
vce = c("CR1", "CR2", "CR3"),
coef = c(r_cr1$coef[2], r_cr2$coef[2], r_cr3$coef[2]),
se_rb = c(r_cr1$se[3], r_cr2$se[3], r_cr3$se[3]),
pval = c(r_cr1$pv[3], r_cr2$pv[3], r_cr3$pv[3])
)
cl_tbl[, 2:4] <- round(cl_tbl[, 2:4], 3)
print(cl_tbl, row.names = FALSE)
#> vce coef se_rb pval
#> CR1 7.505 1.796 0
#> CR2 7.501 1.800 0
#> CR3 7.494 1.826 0The point estimate (bias-corrected) is the same across estimators — only the standard errors differ. CR3 produces the widest intervals and is the safest default when the number of clusters is moderate.
When cluster is supplied, only
cr1/cr2/cr3 are valid; the HC
variants and nn are silently remapped (or warned and
remapped, for HC) to the corresponding cluster-robust estimator:
vce = "nn" + cluster → cr1 (silent
default)vce = "hc0" + cluster → cr1 (with
warning)vce = "hc1" + cluster → cr1 (with
warning)vce = "hc2" + cluster → cr2 (with
warning)vce = "hc3" + cluster → cr3 (with
warning)Conversely, asking for cr1/cr2/cr3 without supplying a
cluster falls back to the corresponding HC variant with a
warning.
summary(r_cr3)
#> Call: rdrobust
#>
#> Sharp RD estimates using local polynomial regression. Std. errors are clustered (100 clusters).
#>
#> Number of Obs. 1297
#> BW type mserd
#> Kernel Triangular
#> VCE method CR3
#>
#> Left Right
#> Number of Obs. 595 702
#> Eff. Number of Obs. 369 328
#> Order est. (p) 1 1
#> Order bias (q) 2 2
#> BW est. (h) 18.259 18.259
#> BW bias (b) 27.844 27.844
#> rho (h/b) 0.656 0.656
#> Clusters (g) 50 50
#> Unique Obs. 595 665
#>
#> =====================================================================
#> Point Robust Inference
#> Estimate z P>|z| [ 95% C.I. ]
#> ---------------------------------------------------------------------
#> RD Effect 7.386 4.105 0.000 [3.916 , 11.072]
#> =====================================================================plot.rdrobust: Visual Summary of RD ResultsVersion 4.0.0 adds a plot S3 method for
rdrobust objects. It produces a binscatter with polynomial
fits and confidence bands on a common scale.
Signature:
plot(x, y, x_run, nbins = 20, ci = TRUE, show_effect = FALSE,
title = NULL, x.label = "Running Variable", y.label = "Outcome", ...)plot(r_cr1, df$vote, df$margin,
title = "Senate: incumbency advantage",
x.label = "Vote margin (t)",
y.label = "Vote share (t+2)")show_effect = TRUE appends a second panel with the
conventional point estimate, the robust bias-corrected confidence
interval, and significance stars.
plot(r_cr1, df$vote, df$margin,
show_effect = TRUE,
title = "Senate: incumbency advantage",
x.label = "Vote margin (t)",
y.label = "Vote share (t+2)")rdplot()rdplot() now accepts a hide = TRUE flag and
returns the bin-level data used to draw the plot, so you can build any
visualization on top of ggplot2, lattice, or
base graphics without re-implementing the binning logic.
The returned list contains:
vars_bins — one row per bin: bin midpoint
(rdplot_mean_bin), within-bin mean of y
(rdplot_mean_y), within-bin standard error
(rdplot_se_y), and pointwise CI endpoints
(rdplot_ci_l / rdplot_ci_r) when
ci = is set.vars_poly — a fine evaluation grid
(rdplot_x) with the fitted polynomial values on each side
of the cutoff (rdplot_y).coef — the per-side polynomial coefficient matrix
(Left / Right columns) that produced
vars_poly.out <- rdplot(y = vote, x = margin, c = 0,
hide = TRUE, ci = 95, data = df)
str(out$vars_bins, give.attr = FALSE)
#> 'data.frame': 49 obs. of 9 variables:
#> $ rdplot_mean_bin: num -96.7 -90 -83.3 -76.7 -70 ...
#> $ rdplot_mean_x : num -98.4 -87.7 -83.2 -79.3 -67.9 ...
#> $ rdplot_mean_y : num 25.4 57 42.3 43.5 43.9 ...
#> $ rdplot_min_bin : num -100 -93.3 -86.7 -80 -73.3 ...
#> $ rdplot_max_bin : num -93.3 -86.7 -80 -73.3 -66.7 ...
#> $ rdplot_se_y : num 10.12 0 6.77 0 3.86 ...
#> $ rdplot_N : int 4 1 5 1 2 4 9 11 30 30 ...
#> $ rdplot_ci_l : num -6.76 57.01 23.54 43.5 -5.07 ...
#> $ rdplot_ci_r : num 57.6 57 61.1 43.5 92.9 ...
str(out$vars_poly, give.attr = FALSE)
#> 'data.frame': 1000 obs. of 2 variables:
#> $ rdplot_x: num -100 -99.8 -99.6 -99.4 -99.2 ...
#> $ rdplot_y: num 29.8 30 30.2 30.5 30.7 ...
out$coef
#> Left Right
#> [1,] 43.9372949735 5.334437e+01
#> [2,] -0.3118100526 5.642182e-02
#> [3,] -0.0371985718 8.489156e-03
#> [4,] -0.0007184850 -5.409779e-05
#> [5,] -0.0000039186 -4.578999e-09A minimal custom plot using ggplot2:
library(ggplot2)
poly_l <- subset(out$vars_poly, rdplot_x < 0)
poly_r <- subset(out$vars_poly, rdplot_x >= 0)
ggplot() +
geom_vline(xintercept = 0, linewidth = 0.4) +
geom_errorbar(data = out$vars_bins,
aes(x = rdplot_mean_bin,
ymin = rdplot_ci_l, ymax = rdplot_ci_r),
color = "grey60", width = 0) +
geom_point(data = out$vars_bins,
aes(x = rdplot_mean_bin, y = rdplot_mean_y),
color = "grey40", size = 1.5) +
geom_line(data = poly_l, aes(x = rdplot_x, y = rdplot_y)) +
geom_line(data = poly_r, aes(x = rdplot_x, y = rdplot_y)) +
theme_bw() +
labs(x = "Vote margin (t)", y = "Vote share (t+2)",
title = "Senate: regression function fit")You can also reconstruct the fitted polynomial on a custom x-range
directly from coef, which is convenient when the default
evaluation grid doesn’t span the range you want:
eval_fit <- function(xs, coef, c)
sapply(xs, function(xi) sum(coef * (xi - c) ^ (0:(length(coef) - 1))))
xs <- seq(-50, 50, length.out = 5)
data.frame(
x = xs,
y_left = ifelse(xs < 0, eval_fit(xs, out$coef[, "Left"], 0), NA),
y_right = ifelse(xs >= 0, eval_fit(xs, out$coef[, "Right"], 0), NA)
)
#> x y_left y_right
#> 1 -50 31.85074 NA
#> 2 -25 38.17906 NA
#> 3 0 NA 53.34437
#> 4 25 NA 59.21357
#> 5 50 NA 70.59751The full set of worked examples (default plot, error-bar variant,
shaded CI ribbons, coefficient-based reconstruction) is in
rdplot_illustration.R at the project root, with Python and
Stata twins (rdplot_illustration.py,
rdplot_illustration.do).
rdrobust 4.0.0 registers tidy and
glance methods so that results integrate seamlessly with
the broom package and tidyverse workflows.
tidy(): Coefficient-level outputtidy() returns one row per estimator (Conventional,
Bias-Corrected, Robust), with standard broom column names:
library(broom)
tidy(r_cr3)
#> term estimate std.error statistic p.value conf.low conf.high
#> 1 Conventional 7.386333 1.559265 4.737061 2.168401e-06 4.330230 10.44244
#> 2 Bias-Corrected 7.494018 1.559265 4.806122 1.538859e-06 4.437915 10.55012
#> 3 Robust 7.494018 1.825625 4.104905 4.044813e-05 3.915858 11.07218For most applications, the Robust row (row 3) is the recommended inference — it combines the bias-corrected point estimate with robust standard errors.
glance(): Model-level summariesglance() returns a single-row data frame with key
estimation details: sample sizes, bandwidths, VCE method, and whether
covariates or clusters were used.
glance(r_cr3)
#> nobs.left nobs.right nobs.effective.left nobs.effective.right h.left h.right
#> 1 595 702 369 328 18.2592 18.2592
#> b.left b.right cutoff order.regression order.bias kernel vce bwselect
#> 1 27.84356 27.84356 0 1 2 Triangular CR3 mserd
#> covariates clusters.left clusters.right
#> 1 FALSE 50 50The main payoff of broom integration is easy comparison across specifications. Here we compare all three cluster-robust estimators:
specs <- list(CR1 = r_cr1, CR2 = r_cr2, CR3 = r_cr3)
# Robust BC estimates side-by-side
res <- do.call(rbind, lapply(names(specs), function(nm) {
cbind(spec = nm, tidy(specs[[nm]])[3, ]) # row 3 = robust
}))
res[, c("spec", "estimate", "std.error", "conf.low", "conf.high")]
#> spec estimate std.error conf.low conf.high
#> 3 CR1 7.504956 1.795945 3.984970 11.02494
#> 31 CR2 7.501257 1.800322 3.972691 11.02982
#> 32 CR3 7.494018 1.825625 3.915858 11.07218We can also compare model-level details:
gl <- do.call(rbind, lapply(names(specs), function(nm) {
cbind(spec = nm, glance(specs[[nm]]))
}))
gl[, c("spec", "nobs.effective.left", "nobs.effective.right",
"h.left", "vce", "clusters.left", "clusters.right")]
#> spec nobs.effective.left nobs.effective.right h.left vce clusters.left
#> 1 CR1 366 325 18.08429 CR1 50
#> 2 CR2 367 326 18.14594 CR2 50
#> 3 CR3 369 328 18.25920 CR3 50
#> clusters.right
#> 1 50
#> 2 50
#> 3 50A common workflow is comparing heteroskedasticity-robust and cluster-robust results to assess sensitivity to within-cluster correlation:
r_hc3 <- rdrobust(y = vote, x = margin, vce = "hc3", data = df)
all_specs <- list("HC3" = r_hc3,
"CR1" = r_cr1,
"CR2" = r_cr2,
"CR3" = r_cr3)
comparison <- do.call(rbind, lapply(names(all_specs), function(nm) {
g <- glance(all_specs[[nm]])
t <- tidy(all_specs[[nm]])[3, ] # robust row
data.frame(spec = nm,
estimate = round(t$estimate, 3),
std.error = round(t$std.error, 3),
ci = paste0("[", round(t$conf.low, 3), ", ", round(t$conf.high, 3), "]"),
h = round(g$h.left, 3),
clusters = g$clusters.left)
}))
print(comparison, row.names = FALSE)
#> spec estimate std.error ci h clusters
#> HC3 7.497 1.755 [4.058, 10.936] 17.766 NA
#> CR1 7.505 1.796 [3.985, 11.025] 18.084 50
#> CR2 7.501 1.800 [3.973, 11.03] 18.146 50
#> CR3 7.494 1.826 [3.916, 11.072] 18.259 50A common workflow is to present a small panel of specifications
(sharp, covariate-adjusted, cluster-robust, etc.) in a single table. The
canonical Calonico-Cattaneo-Titiunik presentation pairs the
conventional point estimate with the robust
bias-corrected (RBC) confidence interval — the point estimate
is from row 1 of coef, the CI is row 3 of ci.
The same pattern appears in the Stata illustration
(rdrobust_illustration_new_features.do), built there with
esttab; the R analog uses knitr::kable().
We fit four specifications side-by-side: sharp, sharp+covs, cluster, cluster+covs.
covs_form <- ~ class + termshouse + termssenate
m1 <- rdrobust(y = vote, x = margin, data = df)
m2 <- rdrobust(y = vote, x = margin, covs = covs_form, data = df)
m3 <- rdrobust(y = vote, x = margin, vce = "cr1", cluster = state, data = df)
m4 <- rdrobust(y = vote, x = margin, vce = "cr1", cluster = state,
covs = covs_form, data = df)A small helper extracts the conventional point estimate, the RBC confidence interval, and a few model-level stats from a fitted object:
rdr_row <- function(fit, label) {
data.frame(
spec = label,
estimate = fit$coef[1, 1], # tau_conv
rbc_ci = sprintf("[%.3f, %.3f]", fit$ci[3, 1], fit$ci[3, 2]),
rbc_pval = fit$pv[3, 1],
h = fit$bws[1, 1], # h_left
N_eff = fit$N_h[1] + fit$N_h[2],
clusters = if (!is.null(fit$n_clust)) sum(fit$n_clust) else NA_integer_
)
}
tab <- rbind(
rdr_row(m1, "sharp"),
rdr_row(m2, "sharp + covs"),
rdr_row(m3, "cluster"),
rdr_row(m4, "cluster + covs")
)Render with knitr::kable() for the report:
knitr::kable(
tab,
digits = 3,
col.names = c("Specification", "Estimate", "Robust 95% CI",
"RBC p-value", "h", "Eff. N", "Clusters"),
caption = "RD estimates: conventional point estimate with RBC confidence interval."
)| Specification | Estimate | Robust 95% CI | RBC p-value | h | Eff. N | Clusters |
|---|---|---|---|---|---|---|
| sharp | 7.414 | [4.094, 10.919] | 0 | 17.754 | 683 | NA |
| sharp + covs | 6.850 | [3.728, 10.249] | 0 | 18.033 | 598 | NA |
| cluster | 7.395 | [3.985, 11.025] | 0 | 18.084 | 691 | 100 |
| cluster + covs | 6.832 | [3.726, 10.217] | 0 | 18.431 | 612 | 98 |
The Estimate column reports tau_conv (the standard
local-polynomial point estimate); the Robust 95% CI is
the bias-corrected, robust-SE confidence interval — the inference
recommended by CCT 2014 and the same one printed by
summary(fit) in the Robust row. The CI is centered on
tau_bc, not on tau_conv, which is why a CI may
not be symmetric about the reported point estimate. This is a feature of
robust bias-corrected inference, not a typo.
For a typeset table closer to the Stata esttab layout
(specifications as columns, statistics as rows), transpose:
paper <- t(tab[, c("estimate", "rbc_ci", "h", "N_eff")])
colnames(paper) <- tab$spec
rownames(paper) <- c("RD effect (conv.)", "Robust 95% CI", "h", "Eff. N")
knitr::kable(
paper,
caption = "Senate incumbency: RD estimates across specifications."
)| sharp | sharp + covs | cluster | cluster + covs | |
|---|---|---|---|---|
| RD effect (conv.) | 7.414131 | 6.849875 | 7.394838 | 6.832365 |
| Robust 95% CI | [4.094, 10.919] | [3.728, 10.249] | [3.985, 11.025] | [3.726, 10.217] |
| h | 17.75440 | 18.03336 | 18.08429 | 18.43127 |
| Eff. N | 683 | 598 | 691 | 612 |
To write a publication-ready LaTeX table to a file (the R analog of
the Stata esttab ... using rdrobust_table.tex):
tex <- knitr::kable(
paper,
format = "latex",
booktabs = TRUE,
caption = "RD estimates for Senate incumbency.",
label = "tab:rdrobust"
)
writeLines(tex, "rdrobust_table.tex")For richer formatting (multi-row headers, footnotes, grouping), the
kableExtra and gt packages build on top of
kable() / gt::gt() and accept the same input
data frame.
sessionInfo()
#> R version 4.6.0 (2026-04-24 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26200)
#>
#> Matrix products: default
#> LAPACK version 3.12.1
#>
#> locale:
#> [1] LC_COLLATE=C
#> [2] LC_CTYPE=English_United States.utf8
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.utf8
#>
#> time zone: America/Los_Angeles
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] broom_1.0.12 ggplot2_4.0.3 rdrobust_4.0.0
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.6 jsonlite_2.0.0 dplyr_1.2.1 compiler_4.6.0
#> [5] tidyselect_1.2.1 gridExtra_2.3 tidyr_1.3.2 jquerylib_0.1.4
#> [9] scales_1.4.0 yaml_2.3.12 fastmap_1.2.0 R6_2.6.1
#> [13] labeling_0.4.3 generics_0.1.4 knitr_1.51 MASS_7.3-65
#> [17] backports_1.5.1 tibble_3.3.1 bslib_0.10.0 pillar_1.11.1
#> [21] RColorBrewer_1.1-3 rlang_1.2.0 cachem_1.1.0 xfun_0.57
#> [25] sass_0.4.10 S7_0.2.2 cli_3.6.6 withr_3.0.2
#> [29] magrittr_2.0.5 digest_0.6.39 grid_4.6.0 lifecycle_1.0.5
#> [33] vctrs_0.7.3 evaluate_1.0.5 glue_1.8.1 farver_2.1.2
#> [37] rmarkdown_2.31 purrr_1.2.2 tools_4.6.0 pkgconfig_2.0.3
#> [41] htmltools_0.5.9