1 Introduction

Macro-panel datasets—large panels of countries, regions, or industries observed over many time periods—are increasingly central to empirical economic research. A pervasive feature of such panels is cross-sectional dependence (CSD): the residuals of unit \(i\) are correlated with the residuals of unit \(j\) even after controlling for observable explanatory variables. Ignoring CSD causes standard panel estimators to produce inconsistent estimates and badly sized tests.

The dcce package implements the family of Common Correlated Effects (CCE) estimators introduced by Pesaran (2006) and extended to dynamic panels by Chudik and Pesaran (2015). These estimators filter out unobserved common factors—the source of CSD—by augmenting each unit’s regression with cross-sectional averages (CSAs) of the dependent variable and the regressors. The package further implements long-run estimators (CS-ARDL, CS-DL), an extensive CD test suite, the exponent of cross-sectional dependence, and information criteria for CSA lag selection.

Key features:

Mean Group (MG), CCE-MG, Dynamic CCE (DCCE), Regularized CCE (rCCE)
Long-run estimators: CS-ARDL (Chudik et al. (2016)), CS-DL, Pooled Mean Group (PMG, Shin, Pesaran, and Smith (1999))
CD test suite: Pesaran CD (Pesaran (2015)), CDw (Juodis and Reese (2022)), PEA (Fan, Liao, and Yao (2015)), CD* (Pesaran and Xie (2021))
Exponent of cross-sectional dependence (Bailey, Kapetanios, and Pesaran (2016), Bailey, Kapetanios, and Pesaran (2019))
Information criteria for CSA selection (Margaritella and Westerlund (2023))
Rank condition classifier (De Vos, Everaert, and Sarafidis (2024))
Cross-section and wild bootstrap inference
broom-compatible tidy() and glance() methods

library(dcce)
#> 
#> Attaching package: 'dcce'
#> The following object is masked from 'package:stats':
#> 
#>     D
library(generics)
#> 
#> Attaching package: 'generics'
#> The following objects are masked from 'package:base':
#> 
#>     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#>     setequal, union

2 The Problem: Cross-Sectional Dependence

2.1 What is cross-sectional dependence?

In a panel with \(N\) units and \(T\) time periods, suppose the data generating process is:

\[ y_{it} = \alpha_i + \boldsymbol{\beta}_i' \mathbf{x}_{it} + \varepsilon_{it}, \]

where \(\varepsilon_{it}\) is the error term. Standard panel estimators assume \(\mathbb{E}[\varepsilon_{it}\varepsilon_{jt}] = 0\) for \(i \neq j\). In macro panels, this assumption is routinely violated because units share common shocks (global business cycles, oil price shocks, financial crises) and are connected via trade, capital flows, and technology spillovers.

2.2 Consequences of ignoring CSD

When the errors contain unobserved common factors, \[ \varepsilon_{it} = \boldsymbol{\lambda}_i' \mathbf{f}_t + u_{it}, \] standard fixed-effects and first-differencing estimators are inconsistent if the regressors are also driven by \(\mathbf{f}_t\)—a situation called strong CSD. In that case, \(\mathbf{x}_{it}\) and \(\varepsilon_{it}\) share a common component (\(\boldsymbol{\lambda}_i' \mathbf{f}_t\)), inducing endogeneity. Pesaran (2006) shows the resulting bias can be substantial.

2.3 Testing for CSD: the `pcd_test()` function

Before estimating, we should test for CSD. The Pesaran (2015) CD statistic is the benchmark:

\[ \text{CD} = \sqrt{\frac{2T}{N(N-1)}} \sum_{i=1}^{N-1} \sum_{j=i+1}^{N} \sqrt{T_{ij}}\, \hat{\rho}_{ij} \xrightarrow{d} \mathcal{N}(0,1), \]

where \(\hat{\rho}_{ij}\) is the sample correlation of the residuals of units \(i\) and \(j\). Under the null of no CSD, \(\text{CD} \to \mathcal{N}(0,1)\) as \(N, T \to \infty\).

We illustrate with the pwt8 dataset: 93 countries, 1960–2007, adapted from Ditzen (2018).

data(pwt8, package = "dcce")
str(pwt8)
#> 'data.frame':    4464 obs. of  8 variables:
#>  $ id         : num  1 1 1 1 1 1 1 1 1 1 ...
#>  $ year       : int  1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 ...
#>  $ log_rgdpo  : num  7.78 7.79 7.8 7.75 7.79 ...
#>  $ log_hc     : num  0.704 0.71 0.715 0.72 0.726 ...
#>  $ log_ck     : num  11.3 11.4 11.5 11.4 11.5 ...
#>  $ log_ngd    : num  NA -2.71 -2.72 -2.72 -2.73 ...
#>  $ country    : chr  "1" "1" "1" "1" ...
#>  $ d_log_rgdpo: num  NA 0.01216 0.00821 -0.04934 0.03554 ...

Run a naive pooled OLS of GDP growth on factor inputs, ignoring CSD:

# Pooled OLS on complete cases
pwt_cc <- na.omit(pwt8[, c("country", "year", "d_log_rgdpo",
                             "log_hc", "log_ck", "log_ngd")])
fit_ols <- lm(d_log_rgdpo ~ log_hc + log_ck + log_ngd, data = pwt_cc)

# CD test on OLS residuals
cd_ols <- pcd_test(
  residuals(fit_ols), data = pwt_cc,
  unit_index = "country", time_index = "year",
  test = "pesaran"
)
print(cd_ols)
#> 
#> Cross-Sectional Dependence Tests
#> ================================
#> N = 93, T = 47
#> 
#> CD        statistic =  38.7017  p-value = 0.0000

The CD statistic is far above the critical value (\(\approx 1.96\)), decisively rejecting the null of no cross-sectional dependence. Ignoring this leads to spurious inference.

3 The Model: Heterogeneous Panels with Common Factors

3.1 The general DGP

dcce works with the heterogeneous panel model with unobserved common factors:

\[ y_{it} = \alpha_i + \boldsymbol{\beta}_i' \mathbf{x}_{it} + \boldsymbol{\lambda}_i' \mathbf{f}_t + u_{it}, \quad i = 1,\ldots,N, \quad t = 1,\ldots,T, \]

where:

\(\boldsymbol{\beta}_i\) are unit-specific slope coefficients (heterogeneous);
\(\mathbf{f}_t\) is a \((m \times 1)\) vector of unobserved common factors;
\(\boldsymbol{\lambda}_i\) are unit-specific factor loadings;
\(u_{it}\) is an idiosyncratic error with \(\mathbb{E}[u_{it} u_{jt}] = 0\) for \(i \neq j\).

The regressors follow: \[ \mathbf{x}_{it} = \mathbf{A}_i' \mathbf{g}_t + \boldsymbol{\Gamma}_i' \mathbf{f}_t + \mathbf{v}_{it}, \] so the common factors \(\mathbf{f}_t\) drive both \(y_{it}\) and \(\mathbf{x}_{it}\), inducing CSD and endogeneity.

3.2 The CCE approach

Pesaran (2006) shows that the unobserved \(\mathbf{f}_t\) can be asymptotically spanned by the cross-sectional averages:

\[ \bar{y}_t = N^{-1}\sum_i y_{it}, \quad \bar{\mathbf{x}}_t = N^{-1}\sum_i \mathbf{x}_{it}. \]

These CSAs are observable proxies for the common factors. Augmenting each unit’s regression with the CSAs (and their lags, for dynamic panels) removes the common factor component and restores consistency.

3.3 Dynamic extension (DCCE)

For dynamic panels where \(y_{i,t-1}\) appears as a regressor, Chudik and Pesaran (2015) show that the CCE estimator requires CSA lags to remain consistent. The recommended number of lags is \(p_T = \lfloor T^{1/3} \rfloor\) (roughly 3 for \(T \approx 48\)). This gives the DCCE estimator.

4 Estimators

4.1 Mean Group (MG)

The Mean Group estimator of Pesaran and Smith (1995) runs OLS separately for each unit \(i\):

\[ \hat{\boldsymbol{\beta}}_i^{\text{OLS}} = (\mathbf{X}_i' \mathbf{X}_i)^{-1} \mathbf{X}_i' \mathbf{y}_i, \]

and averages:

\[ \hat{\boldsymbol{\beta}}^{\text{MG}} = N^{-1} \sum_{i=1}^N \hat{\boldsymbol{\beta}}_i. \]

MG is consistent for \(N, T \to \infty\) under heterogeneous slopes, but inconsistent when CSD is present via common factors. It serves as a useful benchmark.

fit_mg <- dcce(
  data            = pwt8,
  unit_index      = "country",
  time_index      = "year",
  formula         = d_log_rgdpo ~ L(log_rgdpo, 1) + log_hc + log_ck + log_ngd,
  model           = "mg",
  cross_section_vars = NULL
)
print(fit_mg)
#> 
#> Dynamic Common Correlated Effects Estimation
#> ============================================
#> Estimator    : Mean Group (MG)
#> Dep. variable: d_log_rgdpo
#> Groups (N)   : 93         Time periods (T): 48
#> Observations : 4371      
#> 
#> Mean Group Coefficients 
#> -----------------------
#>                             Coef.  Std. Err.        z     P>|z|   [95% CI         ]   Sig
#> (Intercept)                1.0501     0.1987     5.28    0.0000    0.6606    1.4397   ***
#> L(log_rgdpo,1)            -0.1996     0.0185   -10.79    0.0000   -0.2358   -0.1633   ***
#> log_hc                     0.0441     0.0847     0.52    0.6025   -0.1219    0.2101      
#> log_ck                     0.0387     0.0150     2.58    0.0099    0.0093    0.0681    **
#> log_ngd                   -0.0208     0.0599    -0.35    0.7286   -0.1383    0.0967      
#> 
#> R-sq (MG): 0.210   RMSE: 0.0678

The L(log_rgdpo, 1) term is the first lag of log GDP—its MG coefficient is \(({\hat\rho} - 1)\), so a negative value indicates mean-reverting dynamics.

4.2 CCE Mean Group (CCE-MG)

The CCE-MG estimator augments each unit’s regression with contemporaneous CSAs:

\[ y_{it} = \alpha_i + \boldsymbol{\beta}_i' \mathbf{x}_{it} + \boldsymbol{\delta}_i' \bar{\mathbf{z}}_t + u_{it}, \]

where \(\bar{\mathbf{z}}_t = (\bar{y}_t, \bar{\mathbf{x}}_t')'\). The coefficients on the CSAs absorb the unobserved factors; the structural parameters \(\hat{\boldsymbol{\beta}}_i\) are then averaged over units.

Pesaran (2006) establishes consistency and asymptotic normality of the CCE-MG estimator under mild conditions.

fit_cce <- dcce(
  data               = pwt8,
  unit_index         = "country",
  time_index         = "year",
  formula            = d_log_rgdpo ~ L(log_rgdpo, 1) + log_hc + log_ck + log_ngd,
  model              = "cce",
  cross_section_vars = ~ log_rgdpo + log_hc + log_ck + log_ngd,
  cross_section_lags = 0
)
print(fit_cce)
#> 
#> Dynamic Common Correlated Effects Estimation
#> ============================================
#> Estimator    : CCE (Mean Group)
#> Dep. variable: d_log_rgdpo
#> Groups (N)   : 93         Time periods (T): 48
#> Observations : 4371       CSA lags        : 0
#> 
#> Mean Group Coefficients 
#> -----------------------
#>                             Coef.  Std. Err.        z     P>|z|   [95% CI         ]   Sig
#> (Intercept)               -1.9852     0.8150    -2.44    0.0149   -3.5826   -0.3878     *
#> L(log_rgdpo,1)            -0.4420     0.0210   -21.05    0.0000   -0.4832   -0.4009   ***
#> log_hc                    -0.5071     0.2236    -2.27    0.0234   -0.9454   -0.0687     *
#> log_ck                     0.1044     0.0353     2.96    0.0031    0.0353    0.1735    **
#> log_ngd                   -0.0564     0.0864    -0.65    0.5142   -0.2257    0.1130      
#> 
#> R-sq (MG): 0.429   RMSE: 0.0594

Comparing MG and CCE-MG shows the effect of correcting for CSD: the coefficients shift, reflecting the removal of common-factor-driven bias.

4.3 Dynamic CCE (DCCE)

When the lagged dependent variable is included, Chudik and Pesaran (2015) require CSA lags to achieve asymptotically unbiased estimates. The DCCE estimator augments each unit’s regression with \(\bar{\mathbf{z}}_t, \bar{\mathbf{z}}_{t-1}, \ldots, \bar{\mathbf{z}}_{t-p_T}\).

fit_dcce <- dcce(
  data               = pwt8,
  unit_index         = "country",
  time_index         = "year",
  formula            = d_log_rgdpo ~ L(log_rgdpo, 1) + log_hc + log_ck + log_ngd,
  model              = "dcce",
  cross_section_vars = ~ log_rgdpo + log_hc + log_ck + log_ngd,
  cross_section_lags = 3
)
print(fit_dcce)
#> 
#> Dynamic Common Correlated Effects Estimation
#> ============================================
#> Estimator    : DCCE (Mean Group)
#> Dep. variable: d_log_rgdpo
#> Groups (N)   : 93         Time periods (T): 48
#> Observations : 4092       CSA lags        : 3
#> 
#> Mean Group Coefficients 
#> -----------------------
#>                             Coef.  Std. Err.        z     P>|z|   [95% CI         ]   Sig
#> (Intercept)               -2.1741     1.7773    -1.22    0.2212   -5.6577    1.3094      
#> L(log_rgdpo,1)            -0.6136     0.0309   -19.83    0.0000   -0.6742   -0.5529   ***
#> log_hc                    -1.2868     0.3880    -3.32    0.0009   -2.0473   -0.5264   ***
#> log_ck                     0.2098     0.0491     4.27    0.0000    0.1135    0.3061   ***
#> log_ngd                    0.0053     0.1001     0.05    0.9580   -0.1909    0.2015      
#> 
#> R-sq (MG): 0.703   RMSE: 0.0377

The coefficient on L(log_rgdpo, 1) is the speed of adjustment in the error correction framework. A value of approximately \(-0.6\) implies \(\hat\rho \approx 0.4\), consistent with mean-reverting dynamics in log GDP.

4.3.1 Verify that DCCE removes CSD

A key diagnostic is to apply the CD test to the DCCE residuals. If the estimator has successfully absorbed the common factors, the residuals should no longer exhibit significant CSD:

cd_dcce <- pcd_test(fit_dcce, test = "pesaran")
print(cd_dcce)
#> 
#> Cross-Sectional Dependence Tests
#> ================================
#> N = 93, T = 44
#> 
#> CD        statistic =   1.4046  p-value = 0.1601

After DCCE correction with three CSA lags, the CD statistic falls to an insignificant level, confirming that the common factors have been filtered out.

4.4 Regularized CCE (rCCE)

The regularized CCE of Juodis (2022) addresses the case where the number of CSA variables is large relative to \(T\). It replaces the full CSA matrix with its first few principal components, regularizing the factor proxy:

fit_rcce <- dcce(
  data               = pwt8,
  unit_index         = "country",
  time_index         = "year",
  formula            = d_log_rgdpo ~ L(log_rgdpo, 1) + log_hc + log_ck + log_ngd,
  model              = "rcce",
  cross_section_vars = ~ log_rgdpo + log_hc + log_ck + log_ngd,
  cross_section_lags = 0
)
print(fit_rcce)
#> 
#> Dynamic Common Correlated Effects Estimation
#> ============================================
#> Estimator    : Regularized CCE
#> Dep. variable: d_log_rgdpo
#> Groups (N)   : 93         Time periods (T): 48
#> Observations : 4371       CSA lags        : 0
#> 
#> Mean Group Coefficients 
#> -----------------------
#>                             Coef.  Std. Err.        z     P>|z|   [95% CI         ]   Sig
#> (Intercept)               -1.9852     0.8150    -2.44    0.0149   -3.5826   -0.3878     *
#> L(log_rgdpo,1)            -0.4420     0.0210   -21.05    0.0000   -0.4832   -0.4009   ***
#> log_hc                    -0.5071     0.2236    -2.27    0.0234   -0.9454   -0.0687     *
#> log_ck                     0.1044     0.0353     2.96    0.0031    0.0353    0.1735    **
#> log_ngd                   -0.0564     0.0864    -0.65    0.5142   -0.2257    0.1130      
#> 
#> R-sq (MG): 0.429   RMSE: 0.0594

4.5 Long-Run Estimators: CS-DL, CS-ARDL, and PMG

For researchers interested in long-run relationships, dcce implements three estimators. All three produce a three-block print output: short-run coefficients, the adjustment (speed of adjustment to equilibrium), and long-run coefficients.

4.5.1 CS-DL (Cross-Sectionally Augmented Distributed Lag)

The CS-DL estimator of Chudik et al. (2016) uses \(\Delta y_{it}\) as the dependent variable and regresses it on the level of \(x_{it}\) (whose coefficient is the long-run effect) plus the contemporaneous and lagged first differences of \(x\) (as short-run controls) plus CSAs:

\[ \Delta y_{it} = \alpha_i + w'_i x_{it} + \sum_{\ell=0}^{p_x} \phi'_{i\ell} \Delta x_{i,t-\ell} + \delta'_i \bar{\mathbf{z}}_t + u_{it}. \]

dcce automatically constructs \(\Delta y\) as the LHS and adds \(\Delta x, \Delta x_{-1}, \ldots, \Delta x_{-p_x}\) when model = "csdl". You just provide the levels formula and set csdl_xlags:

fit_csdl <- dcce(
  data               = pwt8,
  unit_index         = "country",
  time_index         = "year",
  formula            = log_rgdpo ~ log_ck + log_hc + log_ngd,
  model              = "csdl",
  cross_section_vars = ~ log_rgdpo + log_hc + log_ck + log_ngd,
  cross_section_lags = 3,
  csdl_xlags         = 3
)
print(fit_csdl)
#> 
#> Dynamic Common Correlated Effects Estimation
#> ============================================
#> Estimator    : CS-DL (Long-run)
#> Dep. variable: d.log_rgdpo
#> Groups (N)   : 93         Time periods (T): 48
#> Observations : 3999       CSA lags        : 3
#> 
#> Short-run (unit-level MG) Estimates 
#> -----------------------------------
#>                             Coef.  Std. Err.        z     P>|z|   [95% CI         ]   Sig
#> (Intercept)                1.1411     3.5539     0.32    0.7481   -5.8244    8.1067      
#> log_ck                    -0.0712     0.1028    -0.69    0.4886   -0.2727    0.1303      
#> d.log_ck                   0.4786     0.1293     3.70    0.0002    0.2253    0.7320   ***
#> d.log_ck_L1               -0.4448     0.1214    -3.66    0.0002   -0.6828   -0.2068   ***
#> d.log_ck_L2               -0.0424     0.1120    -0.38    0.7050   -0.2619    0.1771      
#> d.log_ck_L3               -0.0947     0.0810    -1.17    0.2427   -0.2535    0.0641      
#> log_hc                    -0.9120     0.8142    -1.12    0.2627   -2.5079    0.6839      
#> d.log_hc                   4.2664     2.0527     2.08    0.0377    0.2431    8.2897     *
#> d.log_hc_L1                2.0596     1.8832     1.09    0.2741   -1.6314    5.7506      
#> d.log_hc_L2                3.6903     1.5764     2.34    0.0192    0.6006    6.7800     *
#> d.log_hc_L3                0.9162     1.5994     0.57    0.5667   -2.2185    4.0509      
#> log_ngd                    0.3105     0.4792     0.65    0.5171   -0.6289    1.2498      
#> d.log_ngd                  0.5325     0.5359     0.99    0.3204   -0.5179    1.5830      
#> d.log_ngd_L1               0.1516     0.5985     0.25    0.8000   -1.0215    1.3247      
#> d.log_ngd_L2              -1.3005     0.4644    -2.80    0.0051   -2.2106   -0.3903    **
#> d.log_ngd_L3               0.6503     0.5418     1.20    0.2301   -0.4117    1.7123      
#> 
#> Long-run Estimates (CS-DL, direct)
#> ----------------------------------
#>                             Coef.  Std. Err.        z     P>|z|   [95% CI         ]   Sig
#> log_ck                    -0.0712     0.1028    -0.69    0.4886   -0.2727    0.1303      
#> log_hc                    -0.9120     0.8142    -1.12    0.2627   -2.5079    0.6839      
#> log_ngd                    0.3105     0.4792     0.65    0.5171   -0.6289    1.2498      
#> 
#> R-sq (MG): 0.828   RMSE: 0.0292

The coefficients on log_ck, log_hc, and log_ngd in the Long-run Estimates block are the long-run elasticities of GDP.

4.5.2 CS-ARDL (Cross-Sectionally Augmented ARDL)

The CS-ARDL estimator fits an ARDL\((p_y, p_x)\) model with CSAs per unit and recovers long-run coefficients and the speed of adjustment via the delta method:

\[ y_{it} = \alpha_i + \sum_{p=1}^{P_y} \phi_{ip} y_{i,t-p} + \sum_{q=0}^{P_x} \beta'_{iq} x_{i,t-q} + \delta'_i \bar{\mathbf{z}}_t + e_{it}, \]

with long-run coefficients \(\theta_{ik} = \frac{\sum_q \beta_{ikq}}{1 - \sum_p \phi_{ip}}\) and adjustment speed \(\varphi_i = -(1 - \sum_p \phi_{ip})\). Users specify the dynamic structure explicitly using L() in the formula:

fit_csardl <- dcce(
  data               = pwt8,
  unit_index         = "country",
  time_index         = "year",
  formula            = log_rgdpo ~ L(log_rgdpo, 1)
                                   + log_hc + L(log_hc, 1)
                                   + log_ck + L(log_ck, 1)
                                   + log_ngd + L(log_ngd, 1),
  model              = "csardl",
  cross_section_vars = ~ log_rgdpo + log_hc + log_ck + log_ngd,
  cross_section_lags = 3
)
print(fit_csardl)
#> 
#> Dynamic Common Correlated Effects Estimation
#> ============================================
#> Estimator    : CS-ARDL (Short + Long run)
#> Dep. variable: log_rgdpo
#> Groups (N)   : 93         Time periods (T): 48
#> Observations : 4092       CSA lags        : 3
#> 
#> Short-run (unit-level MG) Estimates 
#> -----------------------------------
#>                             Coef.  Std. Err.        z     P>|z|   [95% CI         ]   Sig
#> (Intercept)               -1.3682     1.8950    -0.72    0.4703   -5.0823    2.3459      
#> L(log_rgdpo,1)             0.3129     0.0321     9.74    0.0000    0.2499    0.3759   ***
#> log_hc                     0.6795     1.1096     0.61    0.5402   -1.4952    2.8543      
#> L(log_hc,1)               -1.9116     1.0569    -1.81    0.0705   -3.9832    0.1599     .
#> log_ck                     0.8477     0.0959     8.84    0.0000    0.6597    1.0357   ***
#> L(log_ck,1)               -0.5970     0.0798    -7.48    0.0000   -0.7535   -0.4405   ***
#> log_ngd                    0.2376     0.2785     0.85    0.3935   -0.3083    0.7836      
#> L(log_ngd,1)              -0.3126     0.3560    -0.88    0.3799   -1.0104    0.3852      
#> 
#> Adjustment Term
#> ---------------
#>                             Coef.  Std. Err.        z     P>|z|   [95% CI         ]   Sig
#> adjustment                -0.6871     0.0323   -21.26    0.0000   -0.7504   -0.6238   ***
#> 
#> Long-run Estimates (MG)
#> -----------------------
#>                             Coef.  Std. Err.        z     P>|z|   [95% CI         ]   Sig
#> log_hc                    -1.7446     1.1215    -1.56    0.1198   -3.9427    0.4536      
#> log_ck                    -0.1074     0.3350    -0.32    0.7485   -0.7640    0.5492      
#> log_ngd                    1.8888     1.2457     1.52    0.1295   -0.5528    4.3303      
#> 
#> R-sq (MG): 0.979   RMSE: 0.0323

The output shows three blocks:

Short-run coefficients from the ARDL(1,1) regression (unit-level OLS, MG average)
Adjustment term \(\varphi \approx -0.69\), meaning about 69% of the deviation from equilibrium is corrected each period
Long-run elasticities recovered via the delta method, with MG standard errors

4.5.3 PMG (Pooled Mean Group)

The PMG estimator of Shin, Pesaran, and Smith (1999) imposes common long-run coefficients across units while keeping adjustment and short-run dynamics heterogeneous. dcce implements PMG via inverse-variance weighted pooling of the unit-level CS-ARDL long-run estimates — a fast, consistent alternative to the Pesaran-Shin-Smith concentrated-MLE that avoids numerical optimization.

fit_pmg <- dcce(
  data               = pwt8,
  unit_index         = "country",
  time_index         = "year",
  formula            = log_rgdpo ~ L(log_rgdpo, 1)
                                   + log_hc + L(log_hc, 1)
                                   + log_ck + L(log_ck, 1)
                                   + log_ngd + L(log_ngd, 1),
  model              = "pmg",
  cross_section_vars = ~ log_rgdpo + log_hc + log_ck + log_ngd,
  cross_section_lags = 3
)
print(fit_pmg)
#> 
#> Dynamic Common Correlated Effects Estimation
#> ============================================
#> Estimator    : Pooled Mean Group (PMG)
#> Dep. variable: log_rgdpo
#> Groups (N)   : 93         Time periods (T): 48
#> Observations : 4092       CSA lags        : 3
#> 
#> Short-run (unit-level MG) Estimates 
#> -----------------------------------
#>                             Coef.  Std. Err.        z     P>|z|   [95% CI         ]   Sig
#> (Intercept)               -1.3682     1.8950    -0.72    0.4703   -5.0823    2.3459      
#> L(log_rgdpo,1)             0.3129     0.0321     9.74    0.0000    0.2499    0.3759   ***
#> log_hc                     0.6795     1.1096     0.61    0.5402   -1.4952    2.8543      
#> L(log_hc,1)               -1.9116     1.0569    -1.81    0.0705   -3.9832    0.1599     .
#> log_ck                     0.8477     0.0959     8.84    0.0000    0.6597    1.0357   ***
#> L(log_ck,1)               -0.5970     0.0798    -7.48    0.0000   -0.7535   -0.4405   ***
#> log_ngd                    0.2376     0.2785     0.85    0.3935   -0.3083    0.7836      
#> L(log_ngd,1)              -0.3126     0.3560    -0.88    0.3799   -1.0104    0.3852      
#> 
#> Adjustment Term
#> ---------------
#>                             Coef.  Std. Err.        z     P>|z|   [95% CI         ]   Sig
#> adjustment                -0.6871     0.0323   -21.26    0.0000   -0.7504   -0.6238   ***
#> 
#> Long-run Estimates (Pooled, PMG)
#> --------------------------------
#>                             Coef.  Std. Err.        z     P>|z|   [95% CI         ]   Sig
#> log_hc                     0.8299     0.1661     5.00    0.0000    0.5043    1.1556   ***
#> log_ck                     0.1929     0.0200     9.64    0.0000    0.1537    0.2321   ***
#> log_ngd                   -0.0420     0.0334    -1.26    0.2093   -0.1075    0.0236      
#> 
#> R-sq (MG): 0.979   RMSE: 0.0323

Compared to CS-ARDL, PMG produces tighter standard errors on the long-run coefficients because it exploits the homogeneity restriction.

4.6 Comparing All Estimators

models <- list(
  MG      = fit_mg,
  CCE     = fit_cce,
  DCCE    = fit_dcce,
  CS_DL   = fit_csdl,
  CS_ARDL = fit_csardl
)

# Extract coefficient on physical capital (log_ck) across models
key_var <- "log_ck"
compare <- data.frame(
  estimator = names(models),
  estimate  = vapply(models, function(m) {
    b <- coef(m)
    if (key_var %in% names(b)) unname(b[key_var]) else NA_real_
  }, numeric(1)),
  std_error = vapply(models, function(m) {
    se <- m$se
    if (key_var %in% names(se)) unname(se[key_var]) else NA_real_
  }, numeric(1))
)
compare$t_stat <- compare$estimate / compare$std_error
print(compare, digits = 4)
#>         estimator estimate std_error  t_stat
#> MG             MG  0.03871   0.01502  2.5777
#> CCE           CCE  0.10443   0.03525  2.9622
#> DCCE         DCCE  0.20981   0.04912  4.2710
#> CS_DL       CS_DL -0.07121   0.10282 -0.6926
#> CS_ARDL   CS_ARDL  0.84768   0.09591  8.8385

Note: MG, CCE, and DCCE estimate a short-run coefficient on log_ck within a dynamic error-correction specification (with L(log_rgdpo,1) on the RHS). CS-DL and CS-ARDL target the long-run elasticity of GDP with respect to capital, using a levels-in-first-difference specification. Direct comparisons across the two groups should be interpreted with care.

5 The CD Test Suite

dcce implements four CD tests, each addressing different aspects of the CSD problem.

Test	Reference	Description
`pesaran`	Pesaran (2015)	Benchmark; N(0,1) under H0
`cdw`	Juodis and Reese (2022)	Rademacher-weighted; robust to incidental parameters
`cdwplus`	Baltagi, Feng & Kao (2012)	Bias-adjusted LM with random sign weighting
`pea`	Fan, Liao, and Yao (2015)	Power-enhanced; better against sparse alternatives
`cdstar`	Pesaran and Xie (2021)	Bias-corrected for panels with strong factors

# Run all five CD tests on DCCE residuals
set.seed(42)
cd_all <- pcd_test(fit_dcce,
                   test   = c("pesaran", "cdw", "cdwplus", "pea", "cdstar"),
                   n_reps = 199)
print(cd_all)
#> 
#> Cross-Sectional Dependence Tests
#> ================================
#> N = 93, T = 44
#> 
#> CD        statistic =   1.4046  p-value = 0.1601
#> CDw       statistic =  -0.0977  p-value = 0.9222
#> CDw+      statistic =   1.1500  p-value = 0.2501
#> PEA       statistic =   8.9999  p-value = 0.0000
#> CD*       statistic =   2.2780  p-value = 0.0227

Interpretation: The four tests have different power properties:

Pesaran CD is the workhorse benchmark. Insignificance (\(|CD| < 1.96\)) indicates that the estimator has removed the dominant common factor(s).
CDw (Juodis-Reese) corrects for incidental parameter bias in dynamic panels; a value near zero confirms no systematic residual dependence.
PEA is power-enhanced against sparse alternatives (a few strongly correlated pairs). It can remain significant even when the average pairwise correlation is near zero, so significance here need not contradict the Pesaran CD result.
CD* applies a PCA-based bias correction for panels with strong factors; borderline significance reflects the strength of the factor structure in the data.

The key diagnostic is the Pesaran CD: after DCCE with three lags, a value close to zero (well below \(\pm 1.96\)) confirms that the common factors have been successfully absorbed.

6 Bootstrap Inference

Asymptotic standard errors for DCCE rest on the central limit theorem approximation \(\hat{\boldsymbol{\beta}}^{\text{MG}} \approx \mathcal{N}(\boldsymbol{\beta}, V/N)\). In panels with small \(N\), bootstrap confidence intervals are preferable.

dcce provides a bootstrap() function with two schemes:

Cross-section bootstrap (default): resamples units with replacement, preserving time-series dynamics within each unit.
Wild bootstrap: multiplies residuals by Rademacher or Mammen weights to preserve heteroskedasticity.

# Cross-section bootstrap (not evaluated in vignette to limit build time)
set.seed(42)
boot_res <- bootstrap(fit_dcce, type = "crosssection", reps = 499)
print(boot_res)

For a quick illustration on the simulated dataset (smaller \(N\) and \(T\)):

data(dcce_sim, package = "dcce")

fit_sim <- dcce(
  data               = dcce_sim,
  unit_index         = "unit",
  time_index         = "time",
  formula            = y ~ L(y, 1) + x,
  model              = "dcce",
  cross_section_vars = ~ y + x,
  cross_section_lags = 3
)

set.seed(42)
boot_sim <- bootstrap(fit_sim, type = "crosssection", reps = 199)
print(boot_sim)
#> 
#> Bootstrap Inference
#> ===================
#> Type: crosssection, Reps: 199
#> 
#>                 SE   CI_lo  CI_hi
#> (Intercept) 0.1487 -0.3245 0.2598
#> L(y,1)      0.0213  0.3462 0.4281
#> x           0.0466  0.8462 1.0348

7 Worked Example: Ditzen (2018) Growth Regression

We replicate the core results from Ditzen (2018), Section 7, using the pwt8 dataset (Penn World Tables 8, N = 93 countries, T = 48 years, 1960–2007). The model is a Solow-type growth regression:

\[ \Delta \log y_{it} = \alpha_i + \phi_i \log y_{i,t-1} + \beta_{1i} \log h_{it} + \beta_{2i} \log k_{it} + \beta_{3i} \log(n_{it} + g + \delta) + \text{CSD correction} + u_{it}, \]

where \(y\) is real GDP (output-side), \(h\) is the human capital index, \(k\) is the physical capital stock, \(n+g+\delta\) aggregates population growth and depreciation, and \(\phi_i = \rho_i - 1\) is the speed of adjustment.

7.1 Step 1: Detect CSD in naive residuals

dat <- na.omit(pwt8[, c("country", "year", "d_log_rgdpo",
                          "log_rgdpo", "log_hc", "log_ck", "log_ngd")])
fit_ols2 <- lm(d_log_rgdpo ~ log_hc + log_ck + log_ngd + log_rgdpo, data = dat)
cd_naive <- pcd_test(residuals(fit_ols2), data = dat,
                     unit_index = "country", time_index = "year",
                     test = "pesaran")
cat(sprintf("CD statistic (OLS): %.2f  p-value: %.4f\n",
            cd_naive$statistics$statistic[1],
            cd_naive$statistics$p_value[1]))
#> CD statistic (OLS): 35.38  p-value: 0.0000

Result: The CD statistic is large and significant, confirming strong CSD.

7.2 Step 2: Mean Group (no CSD correction)

fit_mg2 <- dcce(
  data               = pwt8,
  unit_index         = "country",
  time_index         = "year",
  formula            = d_log_rgdpo ~ L(log_rgdpo, 1) + log_hc + log_ck + log_ngd,
  model              = "mg",
  cross_section_vars = NULL
)

cd_mg <- pcd_test(fit_mg2, test = "pesaran")
cat(sprintf("MG: phi = %.3f, log_ck = %.3f\n",
            coef(fit_mg2)["L(log_rgdpo,1)"], coef(fit_mg2)["log_ck"]))
#> MG: phi = -0.200, log_ck = 0.039
cat(sprintf("CD after MG: %.2f  p-value: %.4f\n",
            cd_mg$statistics$statistic[1], cd_mg$statistics$p_value[1]))
#> CD after MG: 30.86  p-value: 0.0000

MG residuals still exhibit significant CSD because MG does not correct for common factors.

7.3 Step 3: Dynamic CCE with 3 CSA lags (Ditzen 2018 Example 7.3)

fit_dcce2 <- dcce(
  data               = pwt8,
  unit_index         = "country",
  time_index         = "year",
  formula            = d_log_rgdpo ~ L(log_rgdpo, 1) + log_hc + log_ck + log_ngd,
  model              = "dcce",
  cross_section_vars = ~ log_rgdpo + log_hc + log_ck + log_ngd,
  cross_section_lags = 3
)

cd_dcce2 <- pcd_test(fit_dcce2, test = "pesaran")
cat(sprintf("DCCE: phi = %.3f, log_ck = %.3f\n",
            coef(fit_dcce2)["L(log_rgdpo,1)"], coef(fit_dcce2)["log_ck"]))
#> DCCE: phi = -0.614, log_ck = 0.210
cat(sprintf("CD after DCCE(3): %.2f  p-value: %.4f\n",
            cd_dcce2$statistics$statistic[1], cd_dcce2$statistics$p_value[1]))
#> CD after DCCE(3): 1.40  p-value: 0.1601

After applying DCCE with three CSA lags, the CD statistic is no longer significant, indicating that the estimator has successfully absorbed the common factors.

7.4 Step 4: Summary table

# Coefficient table comparing MG and DCCE
vars <- c("L(log_rgdpo,1)", "log_hc", "log_ck", "log_ngd")
tab <- data.frame(
  Variable = vars,
  MG_coef  = round(coef(fit_mg2)[vars], 3),
  MG_se    = round(fit_mg2$se[vars], 3),
  DCCE_coef = round(coef(fit_dcce2)[vars], 3),
  DCCE_se   = round(fit_dcce2$se[vars], 3),
  row.names = NULL
)
tab$MG_sig   <- ifelse(abs(coef(fit_mg2)[vars] / fit_mg2$se[vars]) > 1.96, "*", "")
tab$DCCE_sig <- ifelse(abs(coef(fit_dcce2)[vars] / fit_dcce2$se[vars]) > 1.96, "*", "")
print(tab)
#>         Variable MG_coef MG_se DCCE_coef DCCE_se MG_sig DCCE_sig
#> 1 L(log_rgdpo,1)  -0.200 0.018    -0.614   0.031      *        *
#> 2         log_hc   0.044 0.085    -1.287   0.388               *
#> 3         log_ck   0.039 0.015     0.210   0.049      *        *
#> 4        log_ngd  -0.021 0.060     0.005   0.100

Interpretation:

The speed of adjustment (\(\hat\phi\)) is negative and significant, confirming mean-reverting dynamics in log GDP. The DCCE estimate (\(\approx -0.6\)) implies \(\hat\rho \approx 0.4\), plausible for this sample.
Physical capital (log_ck) has a positive, significant effect in both specifications, consistent with the Solow model.
Population growth (log_ngd) is negative, also in line with the Solow model.
The shift in coefficient magnitudes from MG to DCCE reflects the bias from CSD in the MG estimator.

7.5 Step 5: Unit-level heterogeneity

One advantage of the Mean Group approach is access to unit-level estimates. We can examine the distribution of the speed-of-adjustment across countries:

b_unit <- coef(fit_dcce2, type = "unit")
ar_coefs <- b_unit$estimate[b_unit$term == "L(log_rgdpo,1)"]
rho_implied <- ar_coefs + 1  # rho = (phi + 1)

cat(sprintf("Implied rho: mean = %.3f, median = %.3f, [min, max] = [%.3f, %.3f]\n",
            mean(rho_implied), median(rho_implied),
            min(rho_implied), max(rho_implied)))
#> Implied rho: mean = 0.386, median = 0.392, [min, max] = [-0.341, 0.964]
cat(sprintf("Fraction with 0 < rho < 1 (stationarity): %.1f%%\n",
            100 * mean(rho_implied > 0 & rho_implied < 1)))
#> Fraction with 0 < rho < 1 (stationarity): 87.1%

hist(
  rho_implied,
  breaks = 20,
  main   = "Unit-level AR(1) coefficient (rho = phi + 1)",
  xlab   = expression(hat(rho)[i]),
  col    = "steelblue",
  border = "white"
)
abline(v = median(rho_implied), col = "firebrick", lwd = 2, lty = 2)
legend("topright", legend = "Median", col = "firebrick", lty = 2, lwd = 2, bty = "n")

Histogram of unit-level speed of adjustment coefficients

The distribution is concentrated in \((0, 1)\), confirming stationary dynamics for most countries, with the median \(\hat\rho \approx 0.4\).

8 Auxiliary Diagnostics

8.1 Exponent of cross-sectional dependence

The CSD exponent \(\alpha\) of Bailey, Kapetanios, and Pesaran (2016) characterizes the strength of dependence. For \(\alpha = 1\) the panel exhibits strong CSD; for \(\alpha < 1/2\) the CSD is weak and standard estimators remain consistent.

# CSD exponent of log real GDP (levels) across countries
data(pwt8)
X_lev <- do.call(rbind, by(pwt8, pwt8$country, function(d) d$log_rgdpo))
# Keep only time periods with no missing values
X_lev <- X_lev[, colSums(is.na(X_lev)) == 0, drop = FALSE]
alpha_res <- csd_exp(X_lev, use_residuals = FALSE)
print(alpha_res)
#> 
#> Exponent of Cross-Sectional Dependence
#> ======================================
#> Method  : BKP (2016)
#> N = 93, T = 48
#> 
#> alpha = 0.8348  (SE = 0.0114)
#> 95% CI: [0.8124, 0.8573]

An estimate of \(\hat\alpha \approx 0.84\) is close to 1, signalling strong CSD in log real GDP—further motivation for using DCCE. For first-differenced data (growth rates) the exponent is typically much lower because differencing strips out the common trend.

8.2 Information criterion for CSA lag selection

Margaritella and Westerlund (2023) propose IC and PC criteria for selecting the number of CSA variables in static CCE models. We compare static CCE models with 0–3 contemporaneous CSA variables (via varying the number of regressors included in cross_section_vars). For dynamic panels, the Chudik-Pesaran rule \(p_T = \lfloor T^{1/3} \rfloor\) is the standard choice for CSA lags.

# Compare static CCE at lag 0 across different CSA variable sets
# (IC criteria apply to the static CCE case)
lags_to_try <- 0:3
models_ic <- lapply(lags_to_try, function(p) {
  dcce(
    data               = pwt8,
    unit_index         = "country",
    time_index         = "year",
    formula            = d_log_rgdpo ~ L(log_rgdpo, 1) + log_hc + log_ck + log_ngd,
    model              = "cce",
    cross_section_vars = ~ log_rgdpo + log_hc + log_ck + log_ngd,
    cross_section_lags = p
  )
})

ic_values <- sapply(models_ic, function(m) {
  res <- ic(m, models = models_ic)
  c(IC1 = res$IC1, IC2 = res$IC2)
})
ic_table <- data.frame(lags = lags_to_try, t(ic_values))
print(ic_table, digits = 5)
#>   lags     IC1     IC2
#> 1    0 -5.2096 -5.1570
#> 2    1 -4.9671 -4.8619
#> 3    2 -4.7565 -4.5987
#> 4    3 -4.8087 -4.5984
cat(sprintf("IC1 selects %d CSA lag(s)\n", lags_to_try[which.min(ic_table$IC1)]))
#> IC1 selects 0 CSA lag(s)
cat(sprintf("IC2 selects %d CSA lag(s)\n", lags_to_try[which.min(ic_table$IC2)]))
#> IC2 selects 0 CSA lag(s)

A value of 0 selected by IC indicates that contemporaneous CSAs suffice for the static CCE specification—consistent with the Pesaran (2006) theoretical recommendation. For the dynamic DCCE application above, we use 3 lags as prescribed by the Chudik-Pesaran \(p_T = \lfloor T^{1/3} \rfloor \approx 3\) rule.

8.3 Rank condition

The rank condition of De Vos, Everaert, and Sarafidis (2024) checks whether the CSA variables span the factor space, a necessary condition for CCE consistency:

rc <- rank_condition(fit_cce)
cat(sprintf("Estimated factors (m): %d\n", rc$m))
#> Estimated factors (m): 1
cat(sprintf("Rank of avg loadings (g): %d\n", rc$g))
#> Rank of avg loadings (g): 4
cat(sprintf("Rank condition (RC = 1 means satisfied): %d\n", rc$RC))
#> Rank condition (RC = 1 means satisfied): 1

8.4 Panel unit root test: Pesaran CIPS

The CIPS test of Pesaran (2007) is the workhorse panel unit root test under cross-sectional dependence. It augments unit-level Dickey-Fuller regressions with cross-sectional averages and averages the resulting CADF t-statistics.

X_lev <- do.call(rbind, by(pwt8, pwt8$country, function(d) d$log_rgdpo))
X_lev <- X_lev[, colSums(is.na(X_lev)) == 0, drop = FALSE]
cips_res <- cips_test(X_lev, lags = 1, trend = FALSE)
print(cips_res)
#> 
#> Pesaran (2007) CIPS Panel Unit Root Test
#> ========================================
#> N = 93, T = 48, lags = 1, trend = no
#> 
#> H0: all series have a unit root
#> CIPS statistic =  -1.8483    approx p-value = 0.5000
#> 
#> Critical values (Pesaran 2007):
#>   1%:  -2.50    5%:  -2.27    10%:  -2.15

A CIPS statistic below the 5% critical value of \(-2.27\) rejects the null of a panel unit root.

8.5 Slope heterogeneity: Swamy and Pesaran-Yamagata

Before deciding between MG and a pooled estimator, you should formally test whether slopes are homogeneous. The Swamy (1970) test and its standardised large-\(N\) version due to Pesaran & Yamagata (2008) are implemented as swamy_test():

swamy_test(fit_cce)
#> 
#> Swamy / Pesaran-Yamagata Slope Homogeneity Test
#> ===============================================
#> N = 93, k = 4
#> H0: slope coefficients are homogeneous across units
#> 
#> Swamy S           =  1190.8803    df = 368    p-value = 0.0000
#> Pesaran-Yamagata  =    30.0216                 p-value = 0.0000

A large, significant statistic rejects homogeneity and justifies heterogeneous slopes (MG/CCE/DCCE). For this growth regression the test rejects decisively, confirming that growth dynamics differ across countries.

8.6 Hausman test: MG vs pooled

A Hausman-style comparison of MG against the weighted pooled estimator:

hausman_test(fit_cce)
#> 
#> Hausman-type Test: MG vs Pooled
#> ================================
#> H0: slopes are homogeneous (pooled consistent)
#> 
#> H statistic =   126.6929    df = 4    p-value = 0.0000
#> 
#> MG vs Pooled coefficients:
#>                     MG  Pooled    Diff
#> L(log_rgdpo,1) -0.4420 -0.2533 -0.1888
#> log_hc         -0.5071 -0.0250 -0.4821
#> log_ck          0.1044  0.0521  0.0524
#> log_ngd        -0.0564  0.0407 -0.0971

Under the null (homogeneous slopes) both estimators are consistent; under the alternative only MG is. Rejection of the null is another signal that the homogeneous-slope assumption is too restrictive.

8.7 `broom` compatibility

dcce_fit objects are compatible with broom::tidy() and broom::glance(). For long-run estimators, tidy() additionally includes rows for the adjustment speed and the long-run coefficients, tagged in the type column:

tidy(fit_dcce2)
#> # A tibble: 5 × 8
#>   term           estimate std.error statistic  p.value conf.low conf.high type 
#>   <chr>             <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl> <chr>
#> 1 (Intercept)    -2.17       1.78     -1.22   2.21e- 1   -5.66      1.31  mg   
#> 2 L(log_rgdpo,1) -0.614      0.0309  -19.8    1.58e-87   -0.674    -0.553 mg   
#> 3 log_hc         -1.29       0.388    -3.32   9.11e- 4   -2.05     -0.526 mg   
#> 4 log_ck          0.210      0.0491    4.27   1.95e- 5    0.114     0.306 mg   
#> 5 log_ngd         0.00528    0.100     0.0527 9.58e- 1   -0.191     0.201 mg
glance(fit_dcce2)
#> # A tibble: 1 × 10
#>    nobs n_units t_bar r.squared adj.r.squared   rmse cd_statistic cd_p.value
#>   <int>   <int> <dbl>     <dbl>         <dbl>  <dbl>        <dbl>      <dbl>
#> 1  4092      93    48     0.703            NA 0.0377           NA         NA
#> # ℹ 2 more variables: estimator <chr>, cr_lags <int>

# For a CS-ARDL fit, tidy() returns short-run, adjustment, and long-run rows
tidy(fit_csardl)
#> # A tibble: 12 × 8
#>    term          estimate std.error statistic   p.value conf.low conf.high type 
#>    <chr>            <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl> <chr>
#>  1 (Intercept)     -1.37     1.89      -0.722 4.70e-  1   -5.08      2.35  mg   
#>  2 L(log_rgdpo,…    0.313    0.0321     9.74  2.10e- 22    0.250     0.376 mg   
#>  3 log_hc           0.680    1.11       0.612 5.40e-  1   -1.50      2.85  mg   
#>  4 L(log_hc,1)     -1.91     1.06      -1.81  7.05e-  2   -3.98      0.160 mg   
#>  5 log_ck           0.848    0.0959     8.84  9.70e- 19    0.660     1.04  mg   
#>  6 L(log_ck,1)     -0.597    0.0798    -7.48  7.57e- 14   -0.754    -0.441 mg   
#>  7 log_ngd          0.238    0.279      0.853 3.94e-  1   -0.308     0.784 mg   
#>  8 L(log_ngd,1)    -0.313    0.356     -0.878 3.80e-  1   -1.01      0.385 mg   
#>  9 adjustment      -0.687    0.0323   -21.3   2.42e-100   -0.750    -0.624 adju…
#> 10 log_hc          -1.74     1.12      -1.56  1.20e-  1   -3.94      0.454 lr   
#> 11 log_ck          -0.107    0.335     -0.321 7.49e-  1   -0.764     0.549 lr   
#> 12 log_ngd          1.89     1.25       1.52  1.29e-  1   -0.553     4.33  lr

8.8 Confidence intervals

The confint() method supports three types: "mg" for the main coefficients, "lr" for long-run estimates, and "adjustment" for the speed of adjustment:

confint(fit_csardl, type = "lr", level = 0.90)
#>               5.0%     95.0%
#> log_hc  -3.5892357 0.1001217
#> log_ck  -0.6583921 0.4436053
#> log_ngd -0.1602017 3.9377349
confint(fit_csardl, type = "adjustment")
#>                  2.5%      97.5%
#> adjustment -0.7504175 -0.6237584

8.9 Visualising heterogeneity

The plot() method shows the distribution of unit-level coefficients:

plot(fit_dcce2, which = "coef")

Histograms of unit-level coefficients for the DCCE growth regression

Each panel corresponds to one regressor; the dashed red line marks the MG mean. Use which = "resid" for a residual diagnostic plot.

9 Additional Estimators (v0.2.0+)

9.1 Augmented Mean Group (AMG)

The AMG estimator of Eberhardt and Teal (2010) accounts for CSD by extracting a Common Dynamic Process (CDP) from time-dummy coefficients in a pooled first-difference regression. The CDP is then added as a nuisance regressor in each unit’s OLS:

fit_amg <- dcce(
  data = pwt8, unit_index = "country", time_index = "year",
  formula = d_log_rgdpo ~ log_hc + log_ck + log_ngd,
  model = "amg", cross_section_vars = NULL
)
coef(fit_amg)
#>  (Intercept)       log_hc       log_ck      log_ngd 
#> -0.504785797  0.366026564  0.006631021 -0.122830380

9.2 Interactive Fixed Effects (IFE, Bai 2009)

Unlike CCE (which proxies factors via cross-sectional averages), IFE estimates the factors and loadings directly via iterative principal components. It produces pooled slope coefficients (not heterogeneous):

fit_ife <- dcce(
  data = pwt8, unit_index = "country", time_index = "year",
  formula = log_rgdpo ~ log_hc + log_ck + log_ngd,
  model = "ife", n_factors = 2L
)
print(fit_ife)
#> 
#> Dynamic Common Correlated Effects Estimation
#> ============================================
#> Estimator    : Interactive Fixed Effects (Bai 2009)
#> Dep. variable: log_rgdpo
#> Groups (N)   : 93         Time periods (T): 47
#> Observations : 4371      
#> 
#> Mean Group Coefficients 
#> -----------------------
#>                             Coef.  Std. Err.        z     P>|z|   [95% CI         ]   Sig
#> log_hc                     0.7042     0.0711     9.91    0.0000    0.5649    0.8435   ***
#> log_ck                     0.5090     0.0107    47.66    0.0000    0.4880    0.5299   ***
#> log_ngd                    0.1093     0.0222     4.93    0.0000    0.0658    0.1527   ***
#> 
#> R-sq (MG): 0.932   RMSE: 0.0978

9.3 Pooled CCE (CCEP)

Pesaran (2006) proposed two CCE estimators: the Mean Group version (CCE-MG, our model = "cce") and the Pooled version (CCEP). CCEP constrains slopes to be identical across units, gaining efficiency when slopes are truly homogeneous:

fit_ccep <- dcce(
  data = pwt8, unit_index = "country", time_index = "year",
  formula = d_log_rgdpo ~ log_hc + log_ck + log_ngd,
  model = "ccep", cross_section_vars = ~ .
)
coef(fit_ccep)
#> (Intercept)      log_hc      log_ck     log_ngd 
#> -0.48493688  0.03676297 -0.03603347 -0.09946014

10 Additional Diagnostics (v0.3.0+)

10.1 Panel Granger causality

The Dumitrescu-Hurlin (2012) test checks whether one variable Granger-causes another in a heterogeneous panel:

gc <- granger_test(
  data = pwt8, unit_index = "country", time_index = "year",
  y = "d_log_rgdpo", x = "log_ck", lags = 1L
)
print(gc)
#> 
#> Dumitrescu-Hurlin Panel Granger Causality Test
#> ==============================================
#> H0: log_ck does not Granger-cause d_log_rgdpo
#> N = 93, T_bar = 48.0, lags = 1
#> 
#> W-bar          =   1.6590
#> Z-bar          =   4.4937    p-value = 0.0000
#> Z-bar tilde    =   3.8382    p-value = 0.0001

10.2 IPS and LLC panel unit root tests

Standard (non-CSD-robust) benchmarks alongside CIPS:

X_lev <- do.call(rbind, by(pwt8, pwt8$country, function(d) d$log_rgdpo))
X_lev <- X_lev[, colSums(is.na(X_lev)) == 0, drop = FALSE]
panel_ur_test(X_lev, test = "ips", lags = 1)
#> 
#> IPS Panel Unit Root Test
#> ============================ 
#> N = 93, T = 48, lags = 1, trend = no
#> H0: all series have a unit root
#> 
#> Z-t-bar =   1.9498    p-value = 0.9744

10.3 Pedroni cointegration test

panel_coint_test(
  data = pwt8, unit_index = "country", time_index = "year",
  formula = log_rgdpo ~ log_hc + log_ck,
  test = "pedroni", lags = 1
)
#> 
#> PEDRONI Panel Cointegration Test
#> ===================================== 
#> N = 93, T_bar = 48.0, lags = 1
#> H0: no cointegration
#> 
#>   Group t       statistic =   -8.5879   p-value = 0.0000
#>   Group rho     statistic = -104.9153   p-value = 0.0000
#> 
#> Large negative statistics reject the null.

10.4 Structural break test

The sup-Wald test for an unknown break date with Andrews (1993) critical values:

brk <- structural_break_test(
  data = pwt8, unit_index = "country", time_index = "year",
  formula = d_log_rgdpo ~ log_hc + log_ck + log_ngd,
  model = "mg", cross_section_vars = NULL,
  type = "unknown", trim = 0.20
)
print(brk)
#> 
#> Structural Break Test (panel, CSD-robust)
#> =========================================
#> Type:          unknown
#> Base model:    mg
#> Sup-Wald:      9.4991  (df = 3, trim = 0.20)
#> Critical vals: 10% = 9.20,  5% = 10.59,  1% = 14.74   (Andrews 1993)
#> p-value:       0.0861
#> Break date:    1983
#> 
#> Pre-break MG coefficients:
#> (Intercept)      log_hc      log_ck     log_ngd 
#>      0.5123     -0.1561     -0.0379      0.0603 
#> 
#> Post-break MG coefficients:
#> (Intercept)      log_hc      log_ck     log_ngd 
#>     -0.2210      0.1428     -0.0291     -0.1382

11 Impulse Response Functions (v0.4.0)

For dynamic models, irf() traces the response of the dependent variable to a unit shock in a regressor:

data(dcce_sim)
fit_dyn <- dcce(
  data = dcce_sim, unit_index = "unit", time_index = "time",
  formula = y ~ L(y, 1) + x,
  model = "dcce", cross_section_vars = ~ ., cross_section_lags = 3
)
ir <- irf(fit_dyn, impulse = "x", horizon = 10, boot_reps = 99, seed = 42)
print(ir)
#> 
#> Impulse Response Function
#> ========================
#> Impulse: x, Horizon: 10
#> 
#>   h      irf    lower    upper
#>   0 0.936175 0.844731 1.034965
#>   1 0.365308 0.309896 0.425711
#>   2 0.142548 0.110798 0.181026
#>   3 0.055624 0.038508 0.077720
#>   4 0.021705 0.013351 0.033373
#>   5 0.008470 0.004629 0.014333
#>   6 0.003305 0.001595 0.006157
#>   7 0.001290 0.000548 0.002648
#>   8 0.000503 0.000188 0.001136
#>   9 0.000196 0.000064 0.000486
#>  10 0.000077 0.000022 0.000208

12 The `dcce_workflow()` Pipeline

A single function runs all recommended pre-estimation diagnostics and returns a recommended dcce() call:

dcce_workflow(
  data = pwt8, unit_index = "country", time_index = "year",
  formula = log_rgdpo ~ log_hc + log_ck + log_ngd
)

13 Quick-Start Guide

library(dcce)
data(pwt8)

# 1. Detect CSD
fit_ols_qs <- lm(d_log_rgdpo ~ log_hc + log_ck + log_ngd, data = na.omit(pwt8))
pcd_test(residuals(fit_ols_qs), data = na.omit(pwt8),
         unit_index = "country", time_index = "year",
         test = "pesaran")

# 2. Estimate with DCCE
fit_qs <- dcce(
  data               = pwt8,
  unit_index         = "country",
  time_index         = "year",
  formula            = d_log_rgdpo ~ L(log_rgdpo, 1) + log_hc + log_ck + log_ngd,
  model              = "dcce",
  cross_section_vars = ~ log_rgdpo + log_hc + log_ck + log_ngd,
  cross_section_lags = 3
)

# 3. Check residuals
pcd_test(fit_qs, test = "pesaran")

# 4. Tidy output
tidy(fit_qs)
glance(fit_qs)

# 5. Bootstrap
dcce_bootstrap(fit_qs, reps = 499)

14 References

Bailey, Natalia, George Kapetanios, and M. Hashem Pesaran. 2016. “Exponent of Cross-Sectional Dependence: Estimation and Inference.” Journal of Applied Econometrics 31 (6): 929–60. https://doi.org/10.1002/jae.2490.

———. 2019. “Exponent of Cross-Sectional Dependence for Residuals.” Sankhya B 81 (Suppl 1): 46–102. https://doi.org/10.1007/s13571-019-00196-9.

Chudik, Alexander, Kamiar Mohaddes, M. Hashem Pesaran, and Mehdi Raissi. 2016. “Long-Run Effects in Large Heterogeneous Panel Data Models with Cross-Sectionally Correlated Errors.” Essays in Honor of Man Ying (Aman) Ullah 36: 85–135.

Chudik, Alexander, and M. Hashem Pesaran. 2015. “Common Correlated Effects Estimation of Heterogeneous Dynamic Panel Data Models with Weakly Exogenous Regressors.” Journal of Econometrics 188 (2): 393–420. https://doi.org/10.1016/j.jeconom.2015.03.007.

De Vos, Ignace, Gerdie Everaert, and Vasilis Sarafidis. 2024. “A Method to Evaluate the Rank Condition for CCE Estimators.” Econometric Reviews 43 (2–4). https://doi.org/10.1080/07474938.2023.2292383.

Ditzen, Jan. 2018. “Estimating Dynamic Common-Correlated Effects in Stata.” The Stata Journal 18 (3): 585–617. https://doi.org/10.1177/1536867X1801800302.

Eberhardt, Markus, and Francis Teal. 2010. “Productivity Analysis in Global Manufacturing Production.” Economics Series Working Papers 515.

Fan, Jianqing, Yuan Liao, and Jiawei Yao. 2015. “Power Enhancement in High-Dimensional Cross-Sectional Tests.” Econometrica 83 (4): 1497–1541. https://doi.org/10.3982/ECTA12749.

Juodis, Arturas. 2022. “Regularizing Bayesian Predictive Regressions.” Journal of Applied Econometrics 37 (4): 676–88. https://doi.org/10.1002/jae.2887.

Juodis, Arturas, and Simon Reese. 2022. “The Incidental Parameters Problem in Testing for Remaining Cross-Section Correlation.” Journal of Business & Economic Statistics 40 (3): 1191–1203. https://doi.org/10.1080/07350015.2021.1906687.

Margaritella, Luca, and Joakim Westerlund. 2023. “Using Information Criteria to Select Averages in CCE.” The Econometrics Journal 26 (3): 405–21. https://doi.org/10.1093/ectj/utad005.

Pesaran, M. Hashem. 2006. “Estimation and Inference in Large Heterogeneous Panels with a Multifactor Error Structure.” Econometrica 74 (4): 967–1012. https://doi.org/10.1111/j.1468-0262.2006.00692.x.

———. 2007. “A Simple Panel Unit Root Test in the Presence of Cross-Section Dependence.” Journal of Applied Econometrics 22 (2): 265–312. https://doi.org/10.1002/jae.951.

———. 2015. “Testing Weak Cross-Sectional Dependence in Large Panels.” Econometric Reviews 34 (6–10): 1089–1117. https://doi.org/10.1080/07474938.2014.956623.

Pesaran, M. Hashem, and Ron Smith. 1995. “Estimating Long-Run Relationships from Dynamic Heterogeneous Panels.” Journal of Econometrics 68 (1): 79–113. https://doi.org/10.1016/0304-4076(94)01644-F.

Pesaran, M. Hashem, and Yang Xie. 2021. “Identifying the Effects of Sanctions on the Iranian Economy Using GVAR.”

Shin, Yongcheol, M. Hashem Pesaran, and Ron Smith. 1999. “An Autoregressive Distributed-Lag Modelling Approach to Cointegration Analysis.” Econometrics and Economic Theory in the 20th Century, 371–413.

Introduction to the dcce Package: DCCE Estimation for Panel Data

Mustapha Wasseja

2026-05-02