The xiacf package provides a robust framework for detecting complex non-linear and functional dependence in time series data. Traditional linear metrics, such as the standard Autocorrelation Function (ACF) and Cross-Correlation Function (CCF), often fail to detect symmetrical or purely non-linear relationships.
This package overcomes these limitations by utilizing
Chatterjee’s Rank Correlation (\(\xi\)), offering both univariate
(\(\xi\)-ACF) and multivariate (\(\xi\)-CCF) analysis tools. It features
rigorous statistical hypothesis testing powered by advanced surrogate
data generation algorithms (IAAFT and MIAAFT), all implemented in
high-performance C++ using RcppArmadillo.
future ecosystem.You can install the development version of xiacf from GitHub with:
# install.packages("remotes")
remotes::install_github("yetanothersu/xiacf")(Note: CRAN submission is currently pending. Once accepted, you
can install it via install.packages("xiacf"))
Here is a basic example showing how to compute and visualize the \(\xi\)-ACF against a standard linear ACF.
library(xiacf)
library(ggplot2)
# Generate a chaotic Logistic Map: x_{t+1} = r * x_t * (1 - x_t)
set.seed(42)
n <- 500
x <- numeric(n)
x[1] <- 0.1 # Initial condition
r <- 4.0 # Fully chaotic regime
for (t in 1:(n - 1)) {
x[t + 1] <- r * x[t] * (1 - x[t])
}
# 1. Run the Xi-ACF test
# Computes up to 20 lags with 100 IAAFT surrogates for significance testing
results <- xi_acf(x, max_lag = 20, n_surr = 100)
# Print summary
print(results)
#>
#> Chatterjee's Xi-ACF Test
#>
#> Data length: 500
#> Max lag: 20
#> Significance: 95% (IAAFT, n_surr = 100)
#>
#> Lag ACF Xi Xi_Threshold Xi_Excess
#> 1 -0.094245571 0.988012048 0.04806988 0.93994217
#> 2 -0.002595258 0.976036580 0.04447587 0.93156071
#> 3 0.022361912 0.952317334 0.04603879 0.90627854
#> 4 0.014398212 0.906530090 0.05262789 0.85390220
#> 5 -0.031941140 0.820703278 0.05262117 0.76808211
#> 6 -0.058549287 0.668178745 0.05728133 0.61089741
#> 7 -0.011438562 0.448874296 0.04287672 0.40599758
#> 8 0.005621485 0.211267315 0.03568038 0.17558693
#> 9 -0.060470919 0.102974116 0.03770035 0.06527377
#> 10 0.022159076 -0.016568166 0.03604263 0.00000000
#> 11 -0.045376715 0.032226497 0.05170479 0.00000000
#> 12 -0.072209612 0.030120558 0.04698102 0.00000000
#> 13 0.007066940 0.001429367 0.04629756 0.00000000
#> 14 -0.010218697 0.021486484 0.04490209 0.00000000
#> 15 -0.050879955 0.017715029 0.05063131 0.00000000
#> 16 0.013980615 -0.003368124 0.05575847 0.00000000
#> 17 -0.001535158 0.007055657 0.04938831 0.00000000
#> 18 -0.002734892 -0.040056301 0.04835100 0.00000000
#> 19 -0.004490956 0.001244813 0.04317233 0.00000000
#> 20 0.030634877 -0.012256998 0.04423478 0.00000000
# 2. Visualize the results
# The autoplot method automatically generates a ggplot2 object.
# Statistically significant lags (exceeding the dynamic threshold) are
# automatically highlighted with filled red triangles!
autoplot(results)
Comparison between standard linear ACF and Chatterjee’s Xi-ACF.
While the standard CCF is symmetric in its linear evaluation,
xi_ccf() evaluates the directional
non-linear lead-lag relationship. By default
(bidirectional = TRUE), it computes both “\(X\) leads \(Y\)” and “\(Y\) leads \(X\)” simultaneously. Thanks to our
optimized C++ engine, the reverse direction is computed essentially at
zero additional cost by reusing the MIAAFT surrogates.
# Generate a pure non-linear lead-lag relationship
# Y is driven by the absolute value of X from 3 periods ago.
set.seed(42)
n <- 300
# A uniform distribution centered at 0 ensures the linear cross-correlation is zero
X <- runif(n, min = -2, max = 2)
Y <- numeric(n)
for (t in 4:n) {
Y[t] <- abs(X[t - 3]) + rnorm(1, sd = 0.2)
}
# Run the bidirectional Xi-CCF test
ccf_results <- xi_ccf(x = X, y = Y, max_lag = 10, n_surr = 100)
# Visualize the results
# The new autoplot generates a beautiful 2-panel graph showing directional dependence.
# Standard CCF misses the V-shaped relationship, but Xi-CCF correctly detects that X leads Y by 3 periods.
autoplot(ccf_results)
For advanced market microstructure or structural break detection, you
can run rolling \(\xi\)-ACF or \(\xi\)-CCF analyses. These functions support
robust parallel processing via the future ecosystem and
seamlessly integrate with timestamps for intuitive visualization.
library(ggplot2)
# Generate dummy time series data with a structural break
set.seed(123)
dates <- seq(as.Date("2020-01-01"), by = "1 day", length.out = 300)
X <- rnorm(300)
Y <- numeric(300)
# First half (Day 1-150): X leads Y by 3 days (non-linear relationship)
Y[1:150] <- c(rnorm(3), abs(X[1:147])) + rnorm(150, sd = 0.1)
# Second half (Day 151-300): The relationship breaks down (pure noise)
Y[151:300] <- rnorm(150)
# Run rolling bidirectional Xi-CCF with time_index
rolling_res <- run_rolling_xi_ccf(
x = X,
y = Y,
time_index = dates, # Pass the dates directly!
window_size = 100,
step_size = 5,
max_lag = 5,
n_surr = 50,
n_cores = 2 # Set to NULL for sequential execution
)
# Visualize the dynamic relationship as a beautiful heatmap
ggplot(rolling_res, aes(x = Window_End_Time, y = Lag, fill = Xi_Excess)) +
geom_tile() +
scale_fill_gradient2(low = "white", high = "firebrick", mid = "white", midpoint = 0) +
facet_wrap(~Direction, ncol = 1) +
scale_x_date(date_labels = "%Y-%m") +
labs(
title = "Rolling Bidirectional Xi-CCF Heatmap",
subtitle = "Detecting structural breaks in non-linear lead-lag dynamics",
x = "Date",
fill = "Excess Xi"
) +
theme_minimal()
xi_matrix()For datasets with more than two variables, computing pairwise relationships one by one is computationally expensive due to the combinatorial explosion of surrogate generation.
In v0.3.x, we introduced xi_matrix(), which
leverages an n-dimensional MIAAFT C++ engine to compute
all directional relationships simultaneously. It generates the
multivariate surrogate matrix only once per iteration, allowing
for blazing-fast network causal discovery.
Let’s test it on a simulated non-linear causal chain (\(A \to B \to C\)): 1. A: Independent noise 2. B: Depends on the square of A from 2 steps ago. 3. C: Depends on the absolute value of B from 1 step ago.
# Generate a chain of non-linear causality
set.seed(42)
n <- 300
A <- runif(n, min = -2, max = 2)
B <- numeric(n)
C <- numeric(n)
for (t in 1:n) {
if (t >= 3) B[t] <- A[t - 2]^2 + rnorm(1, sd = 0.5)
if (t >= 2) C[t] <- abs(B[t - 1]) + rnorm(1, sd = 0.5)
}
df_network <- data.frame(A, B, C)
# Compute the multivariate Xi-correlogram matrix (99% confidence level)
res_matrix <- xi_matrix(df_network, max_lag = 5, n_surr = 100, sig_level = 0.99)You can visualize the entire network of causal relationships,
including the direct links (\(A \to
B\), \(B \to C\)) and the
indirect ripple effect (\(A \to C\) at
Lag 3), using the elegant autoplot method.
# The plot will automatically highlight significant points with filled red triangles!
autoplot(res_matrix)
This project is licensed under the MIT License - see the LICENSE file for details.