| Title: | Augmented Modelling of Relational Events |
| Version: | 1.0.0 |
| Description: | Utilities for simulating and prototyping relational event models, including helpers to generate dynamic event sequences and covariate processes for sender and receiver sets. The endogenous-effect and case-control estimation machinery follows Juozaitiene and Wit (2024) <doi:10.1093/jrsssa/qnae132>. |
| License: | MIT + file LICENSE |
| Depends: | R (≥ 4.2.0) |
| URL: | https://franciscorichter.github.io/amorem/, https://github.com/franciscorichter/amorem |
| Imports: | splines, stats, Rcpp, survival, withr |
| LinkingTo: | Rcpp |
| Suggests: | torch, testthat (≥ 3.1.0), mgcv, coxme, roxygen2 (≥ 7.3.0), pkgdown (≥ 2.0.0), knitr, rmarkdown |
| Encoding: | UTF-8 |
| LazyData: | true |
| Config/testthat/edition: | 3 |
| RoxygenNote: | 7.3.3 |
| VignetteBuilder: | knitr |
| NeedsCompilation: | yes |
| Packaged: | 2026-06-22 19:25:02 UTC; pancho |
| Author: | Francisco Richter [aut, cre], Martina Boschi [aut], Ernst C. Wit [aut], Melania Lembo [aut] |
| Maintainer: | Francisco Richter <richtf@usi.ch> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-29 13:50:12 UTC |
amorem: Augmented Modelling of Relational Events
Description
Utilities for simulating and prototyping relational event models, including helpers to generate dynamic event sequences and covariate processes for sender and receiver sets. The endogenous-effect and case-control estimation machinery follows Juozaitiene and Wit (2024) doi:10.1093/jrsssa/qnae132.
Author(s)
Maintainer: Francisco Richter richtf@usi.ch
Authors:
Martina Boschi
Ernst C. Wit
Melania Lembo
See Also
Useful links:
Attach static covariates to an event log
Description
This helper augments an event log with sender and/or receiver covariates that live in separate lookup tables. It is designed for static covariates (one row per actor). Dynamic covariates should be merged manually before calling this helper.
Usage
attach_static_covariates(
event_log,
sender_covariates = NULL,
receiver_covariates = NULL,
actor_col = "actor",
sender_prefix = "sender_",
receiver_prefix = "receiver_",
allow_missing = TRUE
)
Arguments
event_log |
A standardized event log containing columns |
sender_covariates, receiver_covariates |
Data frames with one row per
actor. Each must include the identifier column specified by |
actor_col |
Name of the identifier column inside the covariate tables. |
sender_prefix, receiver_prefix |
Prefixes applied to the appended covariate column names. |
allow_missing |
Logical; if |
Value
The input event_log with additional columns for each covariate table
supplied.
Classroom actor attributes (McFarland 2001)
Description
Per-actor covariates for the classroom_events event stream.
Usage
classroom_actors
Format
A data frame with 20 rows and 3 columns:
- id
Character actor id matching the
sender/receivercolumns of classroom_events.- sex
Factor
"F"/"M"— biological sex.- role
Factor with levels
"instructor","grade_11","grade_12".
Source
McFarland (2001), via networkDynamic. See classroom_events.
Classroom interaction events (McFarland 2001)
Description
Time-stamped directed interactions among 20 individuals in a single
US high-school class session, recorded on 16 October 1996 by Daniel
McFarland. The same data appear in the networkDynamic R package as
McFarland_cls33_10_16_96; this is a tidy event-table form.
Usage
classroom_events
Format
A data frame with 691 rows and 5 columns:
- time
Minutes since the start of the class period.
- sender
Character actor id matching classroom_actors
$id.- receiver
Character actor id matching classroom_actors
$id.- interaction_type
Factor with levels
"social","sanction","task".- weight
Integer weight of the interaction.
Source
McFarland, D. (2001). Student resistance: How the formal and
informal organization of classrooms facilitate everyday forms of
student defiance. American Journal of Sociology 107(3), 612–678.
doi:10.1086/338779. Redistributed via the networkDynamic R
package (CRAN), dataset McFarland_cls33_10_16_96.
See Also
CollegeMsg: private messages on a university online community
Description
Directed time-stamped instant messages between students of the University of California, Irvine over 193 days in 2004. Each row is one message. Sourced from the SNAP repository.
Usage
college_msg
Format
A data frame with 59,835 rows and 3 columns:
- time
Days since the first message.
attr(., "unix_origin")holds the Unix epoch oftime = 0.- sender
Character user id.
- receiver
Character user id.
Source
Panzarasa, P., Opsahl, T., Carley, K. (2009). Patterns and dynamics of users' behavior and interaction: Network analysis of an online community. Journal of the American Society for Information Science and Technology 60(5), 911–932. Distributed via SNAP: https://snap.stanford.edu/data/CollegeMsg.html.
Compare candidate endogenous specifications by AIC
Description
Superseded by rem(), the unified front-end for fitting relational
event models on preprocessed case-control data. compare_models() remains
fully supported.
Convenience wrapper that runs the canonical case-control / no-intercept
binomial-GLM recipe on every specification in models and returns a
tidy AIC comparison table. One case-control sample is drawn from
event_log and shared across every specification so that the AIC
values are directly comparable.
Each specification is a character vector of stat names accepted by
endogenous_features(). The function computes the union of
all stats once, builds case-minus-control differences, and fits one
binomial GLM per specification with the appropriate subset of
columns. The fitted models are equivalent to the partial-likelihood
parametrisation used in case-control REM inference
(Vu et al. 2017; Juozaitienė & Wit 2024).
For n_controls = 1 the helper fits a no-intercept binomial GLM
on case-minus-control differences. For n_controls > 1 it falls
back to survival::clogit() — a true conditional-logistic fit
that correctly handles multiple controls per stratum. The
survival package is in Suggests and is required only when
n_controls > 1.
Usage
compare_models(
event_log,
models,
n_controls = 1,
scope = c("all", "appearance", "citation"),
mode = c("one", "two"),
random_effects = NULL,
half_life = NULL,
seed = NULL,
keep_fits = FALSE
)
Arguments
event_log |
Data frame with |
models |
Named list of character vectors. Each entry names one
candidate specification; the vector contents are the
endogenous statistics it includes. Stats must be valid names for
|
n_controls |
Number of controls per case in
|
scope, mode |
Passed through to |
random_effects |
Optional character vector. May be any of
|
half_life |
Required when any specification contains an exp-decay stat. Shared across all specs that use one. |
seed |
Optional integer seed for the case-control sample. |
keep_fits |
Logical; when |
Value
A data frame with one row per specification and columns
model, n_terms, n_obs, log_lik, AIC, delta_AIC. Sorted
ascending by AIC. The model with the lowest AIC has
delta_AIC = 0.
References
Juozaitienė R, Wit EC (2024). It's about time: revisiting reciprocity and triadicity in relational event analysis. Journal of the Royal Statistical Society Series A 188(4), 1246-1262. doi:10.1093/jrsssa/qnae132.
See Also
endogenous_features(), sample_non_events().
Examples
data(classroom_events)
compare_models(
classroom_events,
models = list(
count = c("reciprocity_count", "transitivity_count"),
continuous = c("reciprocity_time_recent",
"transitivity_time_recent"),
interrupted = c("reciprocity_time_recent_interrupted",
"transitivity_time_recent_interrupted")),
seed = 11)
Compare REM specifications with global covariate effects
Description
Superseded by rem(), the unified front-end for fitting relational
event models on preprocessed case-control data. compare_models_global()
remains fully supported.
Implements the time-shifted partial likelihood of Lembo,
Juozaitienė, Vinciotti & Wit (2025) for fitting relational event
models with global covariate effects — covariates that are
time-dependent but constant across all interacting pairs (e.g.\
temperature, time of day, the residual baseline hazard). Standard
case-control partial likelihood cannot identify these because
global terms cancel in the rate ratio; this function follows the
paper's Section 4 recipe: a random per-dyad time shift breaks the
cancellation, and with one non-event per event the partial
likelihood reduces to a degenerate logistic additive model fit by
mgcv::gam().
Per the paper's equations 11-13:
\mathcal{L}^{PS}(f, g) = \prod_{k=1}^n
\frac{\exp\{\Delta_k(f; x_{s_k r_k}) + \Delta_k(g; x_k)\}}
{1 + \exp\{\Delta_k(f; x_{s_k r_k}) + \Delta_k(g; x_k)\}}
where each \Delta_k is the difference between the
(smooth) function evaluated at the focal event time and at the
sampled non-event's shifted time
t^*_k = t_k - h_{s^* r^*}.
Shift distribution. Per-dyad shifts H_{sr} are drawn
independently from an exponential distribution with mean
\nu \cdot \bar{\Delta t} where
\bar{\Delta t} is the average inter-arrival time
in event_log. The paper's simulation studies find that
\nu = 1 works in practice and that the estimates
are robust to choices in [0.1, 10].
Specification format. Each entry of models is a named character
vector mapping a covariate name (a statistic in
endogenous_features() or a column of
global_covariates) to an effect type:
-
"linear"– linearbeta * xterm. -
"nl"– smooths(x)(thin-plate, paper's default). -
"tv"– smooths(time, by = x)(time-varying). -
"tvnl"– tensor productte(time, x). -
"global_smooth"– smooths(x_global)evaluated at the focal time vs. the non-event's shifted time (the paper'sg_b(x^{(b)}(t))family). -
"global_cyclic"– cyclic smooths(x_global, bs = "cc")for time-of-day-like covariates with a periodic domain. -
"global_time"– a smooth ontimeitself, recovering the residual time effectg_0(t)of paper eq. 3.
Usage
compare_models_global(
event_log,
models,
global_covariates = NULL,
scope = c("all", "appearance", "citation"),
mode = c("one", "two"),
half_life = NULL,
shift_scale = 1,
k = NULL,
k_cyclic = 10,
seed = NULL,
keep_fits = FALSE
)
Arguments
event_log |
Data frame with |
models |
Named list of specifications (see "Specification format" above). |
global_covariates |
Optional data frame with a |
scope, mode |
Passed through to |
half_life |
Required when any dyadic spec uses an exp-decay stat. |
shift_scale |
Multiplier on the average inter-arrival time for the exponential shift distribution. Defaults to 1. |
k |
Optional knot count for smooth terms (see
|
k_cyclic |
Knot count for |
seed |
Integer seed for the case-control sample and the shift draws. |
keep_fits |
Logical; when |
Value
Data frame with one row per specification and columns
model, n_terms, n_obs, log_lik, AIC, delta_AIC.
References
Lembo M, Juozaitienė R, Vinciotti V, Wit EC (2025). Relational event models with global covariates: an application to bike sharing. Journal of the Royal Statistical Society, Series C. doi:10.1093/jrsssc/qlaf058.
See Also
compare_models() (linear, no globals),
compare_models_smooth() (smooth dyadic effects, no globals).
Examples
data(classroom_events)
# Hourly temperature track on the same time axis:
g <- data.frame(time = seq(0, max(classroom_events$time), length = 50),
temperature = rnorm(50, 20, 5))
compare_models_global(
classroom_events,
models = list(
dyadic_only = c(reciprocity_count = "linear",
transitivity_count = "linear"),
with_global = c(reciprocity_count = "linear",
transitivity_count = "linear",
temperature = "global_smooth",
time = "global_time")),
global_covariates = g,
seed = 11, k = 5)
Compare candidate specifications with smooth (TV / NL / TVNL) effects
Description
Superseded by rem(), which fits the same smooth (TV / NL / TVNL)
effects on preprocessed case-control data. compare_models_smooth() remains
fully supported.
Mirrors compare_models() but lets each statistic in a specification
take one of four effect types instead of a single linear coefficient:
linear, time-varying (TV), non-linear (NL), or jointly time-varying
non-linear (TVNL). The smooth machinery follows
Boschi, Lerner & Wit (2025); the matrix-of-event-vs-non-event trick
is documented in their Section 3.3.
For each specification:
One case-control sample is drawn from
event_logwithn_controls = 1(paired event / non-event design).For every requested statistic, both the case (event) and the control (non-event) features are computed via
endogenous_features().The mgcv design uses the case-vs-control matrix trick:
linear -> a single coefficient on
case - control(columnd_stat).tv ->
s(time, by = d_stat)— smooth in time, multiplied byd_stat.nl ->
s(stat_mat, by = I_mat)wherestat_matis a two-column matrixcbind(case, control)andI_matiscbind(1, -1).tvnl ->
te(time_mat, stat_mat, by = I_mat)tensor product smooth, with time_mat both columns equal to the event time vector.
The model is fitted with
mgcv::gamand a degenerate logistic likelihood: response =rep(1, n), formula =one ~ -1 + ...,family = binomial. This matches Boschi et al. equation 8.
AIC values are directly comparable across specifications because every
fit uses the same case-control sample. Returns the same tidy
data.frame as compare_models().
Usage
compare_models_smooth(
event_log,
models,
scope = c("all", "appearance", "citation"),
mode = c("one", "two"),
half_life = NULL,
k = NULL,
seed = NULL,
keep_fits = FALSE
)
Arguments
event_log |
Data frame with |
models |
Named list of specifications. Each entry is itself a
named character vector (or named list) mapping statistic names to
effect types:
list(
linear = c(reciprocity_count = "linear",
transitivity_count = "linear"),
nl = c(reciprocity_time_recent = "nl",
transitivity_time_recent = "nl"),
tvnl = c(reciprocity_time_recent = "tvnl",
transitivity_time_recent = "tvnl"))
|
scope, mode |
Passed through to |
half_life |
Required when an exp-decay statistic is requested. |
k |
Optional integer: knot count for |
seed |
Integer seed for the case-control sample. |
keep_fits |
Logical; when |
Value
Data frame with one row per specification and columns
model, n_terms, n_obs, log_lik, AIC, delta_AIC.
References
Boschi M, Lerner J, Wit EC (2025). Beyond Linearity and Time-Homogeneity: Relational Hyper Event Models with Time-Varying Non-Linear Effects. arXiv:2509.05289.
See Also
compare_models() for the linear-only variant.
Examples
data(classroom_events)
compare_models_smooth(
classroom_events,
models = list(
linear = c(reciprocity_time_recent = "linear",
transitivity_time_recent = "linear"),
nl = c(reciprocity_time_recent = "nl",
transitivity_time_recent = "nl"),
tvnl = c(reciprocity_time_recent = "tvnl",
transitivity_time_recent = "tvnl")),
seed = 11)
Endogenous statistics with a compiled fast path
Description
Returns the names of the endogenous statistics that
endogenous_features() evaluates with the compiled C++ engine.
Statistics outside this set are computed by the (slower) pure-R fallback.
Value
A character vector of statistic names.
See Also
Examples
length(cpp_supported_stats())
head(cpp_supported_stats())
US state distance matrix
Description
A 56 × 56 matrix of pairwise geographic distances (in metres) between
US states and territories, computed from boundary geometries via
sf::st_distance.
Usage
dist_matrix
Format
A numeric matrix with 56 rows and 56 columns. Row and column names are state/territory names.
Source
Computed from US Census TIGER/Line shapefiles using the tigris, sf, and geosphere packages. See Walker (2024), Pebesma (2018), Hijmans (2022).
Email-Eu-Core temporal (single-department subset)
Description
Internal emails between members of a single department of a large European research institution over ~803 days. The dataset has been filtered to remove self-loops. Sourced from the SNAP repository as a single-department slice of the email-Eu-core-temporal dataset.
Usage
email_eu_core
Format
A data frame with 12,216 rows and 3 columns:
- time
Days since the first email in the recording window.
- sender
Character employee id (anonymised).
- receiver
Character employee id (anonymised).
Source
Paranjape, A., Benson, A.R., Leskovec, J. (2017). Motifs in temporal networks. WSDM '17, 601–610. doi:10.1145/3018661.3018731. Distributed via SNAP: https://snap.stanford.edu/data/email-Eu-core-temporal.html.
Compute endogenous event-network statistics
Description
Given a standardized relational event log, this helper derives historical statistics for each event based on the evolving network. The statistics follow the taxonomy of Juozaitienė and Wit (2025, JRSS-A) and cover reciprocity, transitivity, cyclic closure, sending balance and receiving balance. All definitions use the continuous convention (effects persist even after a closure event).
Usage
endogenous_features(
event_log,
stats = c("sender_outdegree", "receiver_indegree", "reciprocity", "recency"),
half_life = NULL,
sort = TRUE,
history_log = NULL,
prior_log = NULL
)
Arguments
event_log |
A data.frame containing at least |
stats |
Character vector of statistics to compute. See Details for the full list of allowed values. |
half_life |
Positive numeric; the half-life parameter |
sort |
Logical; when |
history_log |
Optional data.frame giving the authoritative event
history (columns |
prior_log |
Optional data.frame of events that precede the study window
(columns |
Details
All statistics are evaluated immediately before the event is logged. They are grouped into five families.
Degree / baseline:
sender_outdegreeNumber of events previously sent by the sender.
receiver_indegreeNumber of events previously received by the receiver.
recencyElapsed time since the last event on the same ordered pair;
NAwhen the dyad is brand new.
Reciprocity — reverse-dyad (receiver \to sender) history:
reciprocity/reciprocity_binary1 if the reverse dyad has ever been observed, 0 otherwise.
reciprocity_countTotal count of past reverse-dyad events.
reciprocity_exp_decayExponentially weighted sum of past reverse-dyad events (requires
half_life).reciprocity_time_recentElapsed time since the most recent reverse-dyad event;
NAif none.reciprocity_time_firstElapsed time since the first reverse-dyad event;
NAif none.
Transitivity — two-path s \to k \to r:
transitivity_binary1 if any intermediary
kexists with both(s,k)and(k,r)beforet.transitivity_countNumber of such intermediaries.
transitivity_binary_orderedLike binary but requiring
(s,k)to precede(k,r).transitivity_count_orderedCount with order restriction.
transitivity_exp_decayExp-decay weighted sum over two-paths (requires
half_life).transitivity_exp_decay_orderedExp-decay with order restriction.
transitivity_time_recentTime since the most recently completed two-path;
NAif none.transitivity_time_firstTime since the earliest two-path;
NAif none.transitivity_time_recent_orderedTime since the most recent ordered two-path;
NAif none.transitivity_time_first_orderedTime since the earliest ordered two-path;
NAif none.
Cyclic closure — two-path r \to k \to s, closed by
s \to r:
cyclic_binary1 if any cyclic two-path exists.
cyclic_countNumber of cyclic intermediaries.
cyclic_time_recentTime since the most recent cyclic two-path formation;
NAif none.cyclic_time_firstTime since the first cyclic two-path formation;
NAif none.
Sending balance — shared target: both s \to k and r \to k
exist:
sending_balance_binary1 if any shared target exists.
sending_balance_countNumber of shared targets.
sending_balance_time_recentTime since the most recent shared-target two-path formation;
NAif none.sending_balance_time_firstTime since the first shared-target two-path formation;
NAif none.
Receiving balance — shared source: both k \to s and
k \to r exist:
receiving_balance_binary1 if any shared source exists.
receiving_balance_countNumber of shared sources.
receiving_balance_time_recentTime since the most recent shared-source two-path formation;
NAif none.receiving_balance_time_firstTime since the first shared-source two-path formation;
NAif none.
The statistic "sender_receivers_set" is special: it adds a list-column
in which each element is the character vector of receivers the row's sender
has reached before that row (the building block for set-valued endogenous
covariates, e.g. an alien species' previously invaded regions). It honours
history_log, so it can be computed for sampled non-events without those
non-events polluting the history.
Value
The event log with added columns, one per requested statistic
(sender_receivers_set is added as a list-column).
Examples
data(classroom_events)
feats <- endogenous_features(classroom_events,
stats = c("reciprocity", "recency"))
head(feats)
GOF test for an auxiliary (unmodelled) statistic
Description
Implements the auxiliary-statistic test of Boschi & Wit (2025),
Section 3.7 / eq. 20. Tests whether a covariate auxiliary that is
not part of model has nonetheless been adequately captured
indirectly by the fitted model. Uses the simulation-based p-value
described in the paper: n_sim replicates of
G^*[\hat\gamma, u] are drawn from i.i.d. standard
normals, the test statistic
T_\phi = \sup_u |G[\hat\gamma, u]| is
computed, and the empirical p-value is the fraction of replicates
with T_{\phi,b}^* \ge T_\phi.
Usage
gof_auxiliary(
event_log,
model,
auxiliary,
n_sim = 1000,
scope = "all",
mode = "one",
half_life = NULL,
seed = NULL
)
Arguments
event_log |
Dyadic event log. |
model |
Named character vector of |
auxiliary |
Name of the statistic to test as an unmodelled
feature; must be a statistic computable by
|
n_sim |
Number of Monte Carlo replicates (default 1000). |
scope, mode, half_life, seed |
See |
Value
List with statistic (T_\phi), p_value,
G, u, and auxiliary.
References
Boschi M, Wit EC (2025). Goodness of fit in relational event models. Statistics and Computing 36(4).
Omnibus GOF test via Cauchy combination
Description
Implements the omnibus test of Boschi & Wit (2025), Section 3.6 /
eq. 19. Runs gof_univariate() per covariate in model, then
combines the resulting p-values via the Cauchy combination
T_o = \tfrac{1}{L}\sum_l \tan(\pi(0.5 - P_l))
(Liu & Xie 2020), with analytic p-value
\tfrac{1}{2} - \arctan(T_o)/\pi.
Usage
gof_global(
event_log,
model,
scope = "all",
mode = "one",
half_life = NULL,
seed = NULL
)
Arguments
event_log |
Dyadic event log. |
model |
Named character vector of |
scope, mode, half_life, seed |
See |
Value
List with statistic (T_o), p_value,
and components (per-covariate data.frame with covariate,
statistic, p_value).
References
Boschi M, Wit EC (2025). Goodness of fit in relational event models. Statistics and Computing 36(4).
Multivariate GOF test for smooth or random-effect covariates
Description
Implements the multivariate test of Boschi & Wit (2025), Section 3.4.
Builds a q-dimensional cumulative residual process from the spline
basis of the requested covariate's smooth effect, normalises by the
inverse-square-root of the empirical variance-covariance matrix
\hat J (eq. 17), and tests against a q-dimensional
standard Brownian bridge via T_\psi = \sup_u \lVert\hat W\rVert^2.
The p-value is computed empirically by simulating n_sim Brownian
bridge trajectories.
Usage
gof_multivariate(
event_log,
model,
covariate,
k_basis = 5,
n_sim = 1000,
scope = "all",
mode = "one",
half_life = NULL,
seed = NULL
)
Arguments
event_log |
Dyadic event log. |
model |
Named character vector of |
covariate |
Name of the covariate to test under a smooth effect. |
k_basis |
Spline-basis dimension for |
n_sim |
Number of simulated Brownian bridges for the empirical p-value (default 1000). |
scope, mode, half_life, seed |
See |
Value
List with statistic (T_\psi), p_value,
W (n x q matrix), u, and covariate.
References
Boschi M, Wit EC (2025). Goodness of fit in relational event models. Statistics and Computing 36(4).
Goodness-of-fit test for a single FLE covariate
Description
Implements the univariate cumulative martingale residual test of
Boschi & Wit (2025), Section 3.3. The test statistic is
T_x = \sup_u |\hat W[u]| where
\hat W[u] is the normalised cumulative score process for
the requested covariate; under correct specification \hat W
converges to a standard Brownian bridge, so the p-value follows the
Kolmogorov-Smirnov distribution
2 \sum_{k\ge 1} (-1)^{k-1} e^{-2 k^2 t^2}.
Usage
gof_univariate(
event_log,
model,
covariate,
scope = "all",
mode = "one",
half_life = NULL,
seed = NULL
)
Arguments
event_log |
Dyadic event log. |
model |
Named character vector of |
covariate |
Name of the covariate in |
scope, mode, half_life, seed |
See |
Value
A list with statistic (T_x), p_value (KS),
W (numeric vector of length n, the normalised process),
and u (the time grid in [0, 1]).
References
Boschi M, Wit EC (2025). Goodness of fit in relational event models. Statistics and Computing 36(4).
Examples
data(classroom_events)
gof_univariate(classroom_events,
model = c(reciprocity_count = "linear",
transitivity_count = "linear"),
covariate = "reciprocity_count", seed = 1)
Activity counter for hyperedge subsets
Description
For a focal candidate hyperedge (t, I, J),
activity(t, I, J) counts the number of past events
(t_m, I_m, J_m) with t_m < t
satisfying I \subseteq I_m AND
J \subseteq J_m.
Usage
hyperedge_activity(hyperedge_log, I, J = character(0), t)
Arguments
hyperedge_log |
A hyperedge log (see |
I |
Character vector of sender names defining the focal subset. |
J |
Character vector of receiver names defining the focal
subset. Pass |
t |
Focal time. Only events strictly before |
Value
A single non-negative integer.
References
Lerner J, Boschi M, Wit EC (2025). Subset repetition.
Endogenous features for a hyperedge event log
Description
Hyperedge analogue of endogenous_features(). Accepts a
hyperedge log (see hyperedge_log()) and computes hyperedge-native
statistics, falling back to the dyadic engine for stat names that
belong to the standard dyadic endogenous catalogue.
Usage
hyperedge_features(hyperedge_log, stats, half_life = NULL)
Arguments
hyperedge_log |
A hyperedge log (see |
stats |
Character vector of statistic names. Mix of hyperedge-
native names listed above and the dyadic catalogue names accepted
by |
half_life |
Required when an exp-decay statistic is requested (only applies to delegated dyadic stats; hyperedge subrep does not use a half-life). |
Details
Recognised hyperedge stat names:
"subrep_<rho>_<l>"Directed subset repetition (paper eq. 4).
rho= sender-side subset cardinality (1..|I|),l= receiver-side subset cardinality (0..|J|, 0 = ignore receivers). Examples:"subrep_1_1"(average activity over single-actor sub-pairs),"subrep_2_1"(over pair-of-senders × single-receiver subpairs)."subrep_<rho>"Undirected subset repetition. Equivalent to
"subrep_<rho>_0"; counts past events whose participant set is a superset of the chosen subset, with no receiver-side restriction."activity"Counts past events whose participant set covers the focal event's entire
(I, J)pair. Equivalent to"subrep_<|I|>_<|J|>".
For dyadic-shaped events (every row has |I| = |J| = 1) and a
dyadic stat name, this function delegates to
endogenous_features() via as_dyadic_log().
Value
The hyperedge log with one added column per requested stat.
References
Boschi M, Lerner J, Wit EC (2025). Beyond Linearity and Time-Homogeneity: Relational Hyper Event Models with Time-Varying Non-Linear Effects. arXiv:2509.05289.
See Also
hyperedge_subrep(), hyperedge_activity(),
endogenous_features().
Examples
hl <- hyperedge_log(
I = list(c("a","b"), c("a","c"), c("b","c"), c("a","b","c")),
J = list(c("X"), c("X","Y"), c("Y"), c("X")),
time = c(1, 2, 3, 4))
hyperedge_features(hl,
stats = c("subrep_1_1", "subrep_2_1", "activity"))
Build / detect / convert hyperedge event logs
Description
A hyperedge log generalises the dyadic
(sender, receiver, time) event log used elsewhere in amorem to a
(I, J, time) event log where I and J are list-columns
containing the set of senders and the set of receivers participating
in each hyperevent. This matches the data model of Boschi, Lerner &
Wit (2025): each event is a time-stamped directed hyperedge
(t_m, I_m, J_m) from a sender set to a
receiver set.
Usage
hyperedge_log(I, J, time)
is_hyperedge_log(x)
as_hyperedge_log(event_log)
as_dyadic_log(hyperedge_log)
Arguments
I |
List-column: each element is a character vector of sender
actor names participating in that event. Length-1 vectors are
allowed (and become standard dyadic events when combined with a
length-1 |
J |
List-column: each element is a character vector of receiver actor names. Empty character vectors are allowed and signal an undirected hyperevent. |
time |
Numeric vector of event times. Must be finite and non-decreasing after sorting. |
x |
A data frame or list-of-columns to test or convert. |
event_log |
A dyadic event log with |
hyperedge_log |
A hyperedge log produced by |
Details
The constructor hyperedge_log() accepts list-columns directly and
performs validation (character members, non-empty sets, finite
times, sorted by time). as_hyperedge_log() promotes a dyadic
(sender, receiver, time) data frame to the hyperedge form by
wrapping each sender and receiver in a length-1 character
vector. as_dyadic_log() is the inverse: it succeeds only when
every row of the hyperedge log has a length-1 sender set AND a
length-1 receiver set.
For undirected hyperevents (e.g.\ multi-actor meetings), pass an
empty receiver set: J = list(character(0), character(0), ...).
The receiver list-column must still be present.
Value
A data.frame with columns I, J, time, additionally
carrying class amorem_hyperedge_log to distinguish it from a
dyadic log in dispatch contexts. Sorted by time ascending.
References
Boschi M, Lerner J, Wit EC (2025). Beyond Linearity and Time-Homogeneity: Relational Hyper Event Models with Time-Varying Non-Linear Effects. arXiv:2509.05289.
Examples
# Two co-authored citation events:
hl <- hyperedge_log(
I = list(c("alice", "bob"), c("alice", "carol")),
J = list(c("paperA"), c("paperA", "paperB")),
time = c(1.0, 2.5))
is_hyperedge_log(hl)
# Round-trip a dyadic log:
dy <- data.frame(sender = c("a", "b"),
receiver = c("b", "c"),
time = c(1, 2))
h <- as_hyperedge_log(dy)
as_dyadic_log(h)
Cardinality columns for a hyperedge event log
Description
Adds two integer columns to a hyperedge log: size_I (the number
of senders) and size_J (the number of receivers). Convenient
shortcut for filtering / case-control sampling matched on
cardinality (see Boschi et al. 2025, Section 3.3).
Usage
hyperedge_sizes(hyperedge_log)
Arguments
hyperedge_log |
A hyperedge log. |
Value
The hyperedge log with two added integer columns.
Subset repetition statistic for a hyperedge event log
Description
For a focal hyperedge (t, I, J) and orders
(\rho, \ell), computes the average activity over
every sender subset of I of size rho and every receiver subset
of J of size l, per Boschi, Lerner & Wit (2025) Equation 4:
\mathrm{subrep}^{\rho,\ell}(t,I,J)
= \frac{1}{\binom{|I|}{\rho}\binom{|J|}{\ell}}
\sum_{I' \subseteq I,\ |I'|=\rho}
\sum_{J' \subseteq J,\ |J'|=\ell}
\mathrm{activity}(t, I', J').
Usage
hyperedge_subrep(
hyperedge_log,
I,
J = character(0),
t,
rho = length(I),
l = length(J)
)
Arguments
hyperedge_log |
A hyperedge log (see |
I |
Character vector of senders for the focal event. |
J |
Character vector of receivers (or |
t |
Focal time. |
rho |
Order on the sender side: subset cardinality. Must be
between 1 and |
l |
Order on the receiver side: subset cardinality. Must be
between 0 and |
Details
For dyadic events with |I| = |J| = 1,
subrep(rho = 1, l = 1) reduces to the dyad event count
(already exposed as reciprocity_count and related stats in
endogenous_features()). The function exists because
for true hyperedge data the average over subsets of intermediate
size captures partial-subset repetition that no dyadic statistic
can represent.
Value
A single non-negative numeric.
References
Boschi M, Lerner J, Wit EC (2025). Beyond Linearity and Time- Homogeneity: Relational Hyper Event Models with Time-Varying Non-Linear Effects. arXiv:2509.05289. Lerner J, et al. (2025). The eventnet computation framework.
Examples
hl <- hyperedge_log(
I = list(c("a","b"), c("a","c"), c("b","c")),
J = list(c("X"), c("X","Y"), c("Y")),
time = c(1, 2, 3))
# Activity for the (a, X) sub-pair before t = 4:
hyperedge_activity(hl, I = "a", J = "X", t = 4)
# First-order subrep on event (a, b) -> X at t = 4:
hyperedge_subrep(hl, I = c("a","b"), J = "X", t = 4, rho = 1, l = 1)
Martingale residuals from a case-control partial-likelihood fit
Description
Computes per-observation martingale residuals
M_i = y_i - \pi_i from a one-control-per-case
partial-likelihood fit, where y_i is the case indicator
inside the (case, control) pair and
\pi_i \;=\; \frac{\exp(\eta_i)}{\exp(\eta_{\mathrm{case}}) + \exp(\eta_{\mathrm{ctrl}})}
is the fitted probability that observation i is the event
in its risk set. The residuals sum to zero within each stratum.
Usage
martingale_residuals(
event_log,
model,
scope = c("all", "appearance", "citation"),
mode = c("one", "two"),
half_life = NULL,
seed = NULL
)
Arguments
event_log |
Dyadic event log (see |
model |
A named character vector mapping statistic name to
|
scope, mode, half_life, seed |
Same meaning as in
|
Details
Useful as a goodness-of-fit diagnostic: plotting residuals vs. time or
vs. a covariate reveals systematic miscalibration. The convention
matches survival::residuals.coxph(type = "martingale") for the
two-element risk set induced by 1-control case-control sampling.
Only the linear partial-likelihood path (compare_models()-style
linear-effect specs) is supported by this helper; for smooth-effect
fits the case-vs-control matrix design used by
compare_models_smooth() does not have a clean per-observation
martingale interpretation.
Value
A data frame with one row per observation in the case-control
table (so 2N rows for N events), with columns:
stratum, role ("case" or "control"), sender, receiver,
time, eta, fitted_prob, residual.
References
Therneau TM, Grambsch PM, Fleming TR (1990). Martingale-based residuals for survival models. Biometrika 77(1), 147–160.
See Also
compare_models(), compare_models_smooth().
Examples
data(classroom_events)
res <- martingale_residuals(
classroom_events,
model = c(reciprocity_count = "linear",
transitivity_count = "linear"),
seed = 1)
plot(res$time, res$residual,
col = ifelse(res$role == "case", "red", "grey60"),
ylab = "Martingale residual", xlab = "Event time")
abline(h = 0)
Control parameters for the neural-network backend of rem()
Description
Collects the architecture and training hyper-parameters used by
rem(method = "nn"). Training maximizes the same conditional-logistic
partial likelihood as method = "clogit" (softmax over each risk set), so
this backend is a drop-in flexible counterpart of the linear conditional
logit. Two predictor architectures are available:
"mlp"a multilayer perceptron scoring the full covariate vector jointly — can represent interactions between statistics.
"additive_spline"an additive predictor
sum_k f_k(x_k)with eachf_ka B-spline expansion fitted by (mini-batch) stochastic gradient — the STREAM construction of Filippi-Mazzola & Wit (2024, JRSS-C 73(4), doi:10.1093/jrsssc/qlae023). Interpretable per-feature curves; withbatch_stratait scales to event logs far beyond what an in-memory smooth fit can hold.
Usage
nn_control(
hidden = c(16L, 8L),
activation = c("relu", "tanh"),
architecture = c("mlp", "additive_spline"),
spline_df = 8L,
batch_strata = NULL,
epochs = 300L,
lr = 0.01,
l2 = 1e-04,
validation = 0.2,
patience = 25L,
standardize = TRUE,
engine = c("r", "torch"),
seed = NULL,
verbose = FALSE
)
Arguments
|
Integer vector of hidden-layer sizes for | |
activation |
Hidden-layer activation for |
architecture |
Predictor architecture: |
spline_df |
Degrees of freedom (basis size) per covariate for
|
batch_strata |
Optional mini-batch size, in strata, for stochastic
gradient training. |
epochs |
Maximum number of training epochs (full passes over the training strata). |
lr |
Adam learning rate. |
l2 |
L2 penalty (weight decay). The pure-R engine penalises the weights
only; the torch engine applies it via Adam's |
validation |
Fraction of strata held out for validation / early
stopping. Set to |
patience |
Early-stopping patience: training stops after this many epochs without improvement of the validation loss; the best parameters are restored. |
standardize |
Z-score the features before training (recommended; the
scaling is stored and re-applied by |
engine |
Training engine: |
seed |
Optional integer seed for reproducible initialization and validation split. |
verbose |
Print the loss every 50 epochs. |
Value
A list of class "nn_control".
See Also
Bootstrap uncertainty for the neural rem() backend
Description
Quantifies uncertainty for a rem(method = "nn") fit by a stratum
bootstrap: the case-control strata are resampled with replacement, the
network is refit on each resample (reusing the original nn_control()
settings, including the training engine), and the spread across refits
yields partial-dependence uncertainty bands and a concordance confidence
interval. This is the inferential counterpart that the point-prediction
nn backend otherwise lacks (coef() returns NULL).
Usage
nn_uncertainty(
object,
data,
B = 200L,
case = NULL,
stratum = NULL,
n_grid = 50L,
level = 0.95,
seed = NULL
)
Arguments
object |
A fitted |
data |
The case-control data frame the model was fit on (same columns). |
B |
Number of bootstrap resamples. |
case, stratum |
Event-indicator and stratum columns, resolved exactly as
in |
n_grid |
Grid resolution for the partial-dependence curves. |
level |
Confidence level for the bands and the concordance interval. |
seed |
Optional integer seed for the resampling. |
Details
Each bootstrap partial-dependence curve is centred (its grid-mean removed) before the pointwise quantiles are taken, so the bands describe uncertainty in the shape of each effect, not the conditional-logit's unidentified per-stratum offset.
Value
An object of class "nn_uncertainty": a per-feature list of
data.frame(x, lo, med, hi) bands, a concordance quantile interval, and
the settings B, level. Has print() and plot() methods.
See Also
Plot partial-dependence uncertainty bands
Description
Plot partial-dependence uncertainty bands
Usage
## S3 method for class 'nn_uncertainty'
plot(x, ...)
Arguments
x |
An |
... |
Passed to the underlying |
Value
x, invisibly.
Manufacturing-company email events (Michalski et al. 2014)
Description
Time-stamped directed emails among employees of a mid-sized
manufacturing company over a nine-month period. Sourced from Network
Repository as the ia-radoslaw-email dataset.
Usage
radoslaw_email
Format
A data frame with 82,927 rows and 4 columns:
- time
Days since the first email.
attr(., "unix_origin")holds the Unix epoch oftime = 0.- sender
Character employee id.
- receiver
Character employee id.
- weight
Integer —
1for every record in the original file.
Source
Michalski, R., Palus, S., Kazienko, P. (2014). Seed selection for spread of influence in social networks: Temporal vs. static approach. New Generation Computing 32(3–4), 213–235. doi:10.1007/s00354-014-0402-9. Distributed via https://networkrepository.com/ia-radoslaw-email.php.
Fit a relational (hyper)event model on preprocessed case-control data
Description
rem() is the unified front-end for fitting relational event models from
already preprocessed case-control data (e.g. produced by eventnet),
where the endogenous/exogenous covariates have already been computed. It is
intended to supersede compare_models(), compare_models_smooth() and
compare_models_global(), which couple feature computation and fitting.
Usage
rem(
formula,
data,
method = c("gam", "clogit", "nn"),
case = NULL,
stratum = NULL,
time = NULL,
k = NULL,
gam_method = NULL,
nn = nn_control(),
...
)
Arguments
formula |
A formula; see Formula syntax. |
data |
A data.frame of preprocessed case-control data (wide for the
|
method |
Estimation backend; see Description. |
case |
Optional name of the 0/1 event-indicator column for the |
stratum |
Name of the column grouping each case with its controls
(required by |
time |
Name of the time column, required for |
k |
Optional integer basis dimension passed to |
gam_method |
Smoothness-selection method for the |
nn |
An |
... |
Reserved for future use. |
Details
Two estimation backends are provided:
"gam"Degenerate logistic regression on a case-1-control design (Boschi, Lerner & Wit 2025): the response is a constant 1 and the linear predictor is built from event-minus-control differences. Supports smooth time-varying (
tv), non-linear (nl) and time-varying non-linear (tvnl) effects viamgcv::gam()."clogit"Conditional logistic regression on a case-k-control design via
survival::clogit()(linear terms only). The case/control strata are taken fromstratum, or derived ascumsum(case == 1)whenstratumisNULL(assuming each case is immediately followed by its controls, the eventnet blocked layout)."nn"Flexible conditional-logistic models on the same case-k-control design as
clogit, trained by (mini-batch) gradient descent on the exact risk-set softmax partial likelihood. Two architectures viann_control(): a multilayer perceptron scoring the full covariate vector jointly (interaction-capable), or anadditive_splinepredictor — per-covariate B-spline expansions fitted by stochastic gradient, the STREAM construction of Filippi-Mazzola & Wit (2024, JRSS-C). No coefficient table;summary()reports in-sample (and, with a validation split, held-out) concordance andplot(type = "pdp")shows per-feature curves. Pure-R implementation, no extra dependencies.
Value
An object of class "rem": a list with the fitted model ($fit),
the method, the original formula, the parsed terms, and the number of
observations n. Has summary(), coef(), plot() and logLik()
methods.
Formula syntax
The right-hand side lists covariates. A bare name is a linear effect; wrap
a name to request a smooth effect (gam method only):
-
tv(x)— time-varying linear effect:s(time, by = d_x). -
nl(x)— non-linear effect:s(cbind(x_ev, x_nv), by = c(1, -1)). -
tvnl(x)— time-varying non-linear effect (tensor product). -
re(x)— random effect of a grouping factorx(e.g. the sender), built from the matchedx_ev/x_nvlevels ass(cbind(x_ev, x_nv), by = cbind(1, -1), bs = "re"), contributingf(event_level) - f(control_level)(following the REM tutorial's species-invasiveness term). Falls back to a single columnxwhenx_ev/x_nvare absent. Identified only when the event and control differ onx.
For the gam method the left-hand side is ignored (the response is the
constant case indicator); for clogit the left-hand side names the 0/1 event
indicator column (e.g. event ~ x), unless case is given explicitly.
Column resolution
For a covariate x, the event/control difference is taken from column x,
else d_x, else x_ev - x_nv. Non-linear terms use transform_x_ev /
transform_x_nv when present (the eventnet spline-transformed covariate),
otherwise x_ev / x_nv. tvnl uses transformed_time when present.
Undirected logs (senders only, no receiver/TARGET column) are supported.
See Also
compare_models_smooth() (superseded), simulate_relational_events()
(whose wide = TRUE output is a valid input here),
simulate_directed_hyperevents_tvnl().
Examples
set.seed(1)
w <- simulate_relational_events(
n_events = 300, senders = paste0("a", 1:12), receivers = paste0("a", 1:12),
n_controls = 1, endogenous_stats = "reciprocity_count",
endogenous_effects = c(reciprocity_count = 0.6), wide = TRUE)
fit <- rem(~ reciprocity_count, data = w, method = "gam")
coef(fit)
Sample non-events for inference
Description
Given an observed event log, generate nested case-control data by sampling counterfactual sender–receiver pairs according to predefined strategies.
Usage
sample_non_events(
event_log,
n_controls = 1,
scope = c("all", "appearance", "citation"),
mode = c("two", "one"),
risk = c("standard", "remove"),
exclude_pairs = NULL,
allow_loops = FALSE,
seed = NULL,
max_attempts = 1000
)
Arguments
event_log |
Data frame with columns |
n_controls |
Number of non-events (controls) to sample per realized event. |
scope |
Candidate set definition. |
mode |
|
risk |
Strategy governing the risk set. |
exclude_pairs |
Optional two-column data.frame/matrix of
|
allow_loops |
Logical; can sampled non-events have identical sender and receiver? |
seed |
Optional seed for reproducibility. |
max_attempts |
Maximum resampling attempts per control before giving up (prevents infinite loops when candidate sets are small). |
Value
A data.frame containing the original events (event = 1) and the
sampled controls (event = 0), grouped by stratum identifiers.
Examples
data(classroom_events)
cc <- sample_non_events(classroom_events, n_controls = 1, seed = 1)
head(cc)
Simulate exogenous actor covariates
Description
Create simple exogenous covariate structures for senders and receivers. The function can return static values (one row per actor) or time-stamped processes (one row per actor and time point) that follow independent AR(1) dynamics.
Usage
simulate_actor_covariates(
senders,
receivers,
covariate_names,
time_points = NULL,
sd = 1,
rho = 0,
seed = NULL
)
Arguments
senders |
Character vector of sender actors. |
receivers |
Character vector of receiver actors. |
covariate_names |
Character vector naming the covariates to simulate. |
time_points |
Optional numeric vector of strictly increasing time stamps for time-varying covariates. When omitted, static covariates are returned. |
sd |
Standard deviation of the innovation noise. |
rho |
AR(1) coefficient used when |
seed |
Optional integer to make the simulation reproducible. |
Value
A list with two elements: sender_covariates and
receiver_covariates. Each element is either a wide data.frame (static
case) or a tidy data.frame with columns actor, time,
covariate, and value (dynamic case).
Examples
sender_cov <- simulate_actor_covariates(
senders = letters[1:3],
receivers = LETTERS[1:2],
covariate_names = c("activity", "recency"),
time_points = seq(0, 4),
rho = 0.6,
sd = 0.2,
seed = 123
)
str(sender_cov)
Simulate directed two-mode hyperedge events
Description
Generates a sequence of directed hyperevents from a sender set
I_m \subseteq V^I to a receiver set
J_m \subseteq V^J, with both I_m
and J_m non-empty. This is the directed two-mode
counterpart to simulate_hyperedge_events() and matches the data
model used in Boschi, Lerner & Wit (2025) Section 5 for
citation networks (authors citing papers).
Usage
simulate_directed_hyperedge_events(
n_events,
senders,
receivers,
min_size_I = 1L,
max_size_I = 1L,
min_size_J = 1L,
max_size_J = 1L,
baseline_rate = 1,
endogenous_stats = character(0),
endogenous_effects = numeric(0),
start_time = 0
)
Arguments
n_events |
Number of events to simulate. |
senders |
Character vector of sender names |
receivers |
Character vector of receiver names |
min_size_I, max_size_I |
Sender-side cardinality bounds. |
min_size_J, max_size_J |
Receiver-side cardinality bounds. |
baseline_rate |
Multiplicative baseline ( |
endogenous_stats |
Character vector of supported stat names:
|
endogenous_effects |
Numeric vector of coefficients,
same length and order as |
start_time |
Simulation start time. |
Details
At each step the simulator enumerates every candidate hyperedge
(I, J) with
|I| \in [\mathrm{min\_size\_I}, \mathrm{max\_size\_I}]
and |J| \in [\mathrm{min\_size\_J}, \mathrm{max\_size\_J}],
computes the rate
\lambda(t, I, J) \;=\; \mathrm{baseline\_rate} \;\cdot\;
\exp\!\left(\sum_k \beta_k \, x_k(t, I, J)\right),
and draws one event proportional to its rate. The waiting time is exponential with rate equal to the total intensity.
Candidate-space size is exponential in
|V^I| and |V^J|, so practical use is
limited to small actor / item universes.
Value
A directed hyperedge log
(amorem_hyperedge_log data frame with I, J, time columns;
J non-empty on every row).
References
Boschi M, Lerner J, Wit EC (2025). Beyond Linearity and Time-Homogeneity: Relational Hyper Event Models with Time-Varying Non-Linear Effects. arXiv:2509.05289, Section 5.
Examples
hl <- simulate_directed_hyperedge_events(
n_events = 40,
senders = paste0("a", 1:4),
receivers = paste0("p", 1:4),
max_size_I = 2, max_size_J = 2,
baseline_rate = 0.3,
endogenous_stats = c("subrep_1_1", "size_I"),
endogenous_effects = c(subrep_1_1 = 0.8, size_I = -0.4))
Simulate directed hyper-events with time-varying and non-linear effects
Description
A teaching-oriented simulator for directed relational hyper-events in
which the sender set and the receiver set are disjoint, driven by exogenous
group covariates with a time-varying effect on the sender side and a
non-linear effect on the receiver side. It is the packaged, parameterised
form of the workshop running example (sunbelt-workshop-materials/running_example.R)
and produces a ready-to-fit case-control dataset for GAM-based estimation of
smooth (TV / NL) effects.
Usage
simulate_directed_hyperevents_tvnl(
sender_attr,
receiver_attr,
time_varying_effect = function(t) sin(2 * t),
nonlinear_effect = function(x) -4 + 2 * exp(-((x - 3)^2)/(2 * 2^2)),
horizon = 2,
dt = 0.01,
n_controls = 1L,
max_group_size_sender = length(sender_attr),
max_group_size_receiver = length(receiver_attr)
)
Arguments
sender_attr |
Named numeric vector of sender actor attributes (names are the sender ids). |
receiver_attr |
Named numeric vector of receiver actor attributes. |
time_varying_effect |
Function of one argument giving the time-varying
coefficient |
nonlinear_effect |
Function of one argument giving the non-linear effect
|
horizon |
Positive numeric; the simulation end time (start is 0). |
dt |
Positive numeric; the time-grid step used by the thinning scheme
(must be smaller than |
n_controls |
Non-negative integer; number of non-event pairs sampled per event. Defaults to 1 (a case-1-control design). |
max_group_size_sender, max_group_size_receiver |
Integers; the largest subset size considered when enumerating sender / receiver groups. Default to the full actor sets (all non-empty subsets). Use 1 for ordinary dyadic events. |
Details
Each sender (resp. receiver) group is a non-empty subset of the sender
(resp. receiver) actors – a hyperedge endpoint. A group's covariate is the
mean of its members' actor attributes. For an ordered group pair
(g_s, g_r) the instantaneous rate at time t is
\lambda_{g_s, g_r}(t) = \exp\bigl(\alpha(t)\, x_{g_s} + f(x_{g_r})\bigr),
where \alpha(t) is the time-varying sender effect
(time_varying_effect), f(\cdot) is the non-linear receiver effect
(nonlinear_effect), and x denotes a group covariate. Events are drawn
on a fixed time grid of width dt by a thinning scheme: within each step the
next inter-event time is sampled from an exponential with the total rate
evaluated at the step midpoint, and the firing pair is chosen with
probability proportional to its rate. For every realised event,
n_controls non-event pairs are sampled uniformly from the remaining group
pairs, yielding a case-n_controls-control design.
Setting max_group_size_sender = 1 and max_group_size_receiver = 1 reduces
the groups to single actors, i.e. ordinary directed dyadic events.
Value
A long-format data.frame, one row per (event or control), with
columns event_id (links a case to its controls), event_time,
event (1 = realised event, 0 = sampled non-event),
sender_group, receiver_group (group labels), and
cov_sender, cov_receiver (group covariates). The true
data-generating effect functions are attached as
attr(x, "truth") (a list with time_varying_effect,
nonlinear_effect, and horizon) for comparison against fitted smooths.
Examples
set.seed(1234)
sa <- setNames(rnorm(4, 5, 1.5), paste0("S", 1:4))
ra <- setNames(rnorm(4, 3, 2.0), paste0("R", 1:4))
d <- simulate_directed_hyperevents_tvnl(sa, ra, horizon = 2, n_controls = 1)
head(d)
table(d$event)
Simulate undirected hyperedge events (multi-actor meetings)
Description
Generates a sequence of undirected hyperevents — meetings of
varying size drawn from the actor set actors — under a linear
hyperedge model. Mirrors the simulation setup of Boschi, Lerner &
Wit (2025) Section 4: each event is a subset of actors with size
in 1..max_size, fired with rate
\lambda(t, I) \;=\; \mathrm{baseline\_rate} \;\cdot\;
\exp\!\left(\sum_k \beta_k \, x_k(t, I)\right),
where each x_k(t, I) is one of the hyperedge-native
covariates supported by hyperedge_features()
(activity, subrep_<rho> for undirected events) or size
(the event's cardinality |I|).
Usage
simulate_hyperedge_events(
n_events,
actors,
max_size,
baseline_rate,
endogenous_stats = character(0),
endogenous_effects = numeric(0),
start_time = 0,
min_size = 1L
)
Arguments
n_events |
Number of events to simulate. |
actors |
Character vector of actor names. |
max_size |
Maximum allowed meeting size ( |
baseline_rate |
Multiplicative baseline ( |
endogenous_stats |
Character vector of stat names accepted by
|
endogenous_effects |
Numeric vector of coefficients, same
length and order as |
start_time |
Simulation start time. |
min_size |
Minimum allowed meeting size. Defaults to 1. |
Details
At each step the simulator enumerates every subset of actors
with size in 1..max_size. The per-event work is therefore
O\!\left(\sum_{s=1}^{w} \binom{|V|}{s}\right); practical for small actor counts (e.g. |V| \le 20,
max_size <= 4).
Value
A hyperedge log (see hyperedge_log()) with n_events
rows.
References
Boschi M, Lerner J, Wit EC (2025). Beyond Linearity and Time-Homogeneity: Relational Hyper Event Models with Time-Varying Non-Linear Effects. arXiv:2509.05289.
Examples
# Five-actor meetings of size up to 3, with weak attractor on
# repeated triads and a size penalty:
hl <- simulate_hyperedge_events(
n_events = 50,
actors = LETTERS[1:5],
max_size = 3,
baseline_rate = 0.2,
endogenous_stats = c("subrep_2", "size"),
endogenous_effects = c(subrep_2 = 0.5, size = -0.3))
Simulate relational event sequences
Description
Generate a simple relational event log for a sender set and receiver set using a softmax allocation rule over dyadic intensities. The process follows the Gillespie algorithm, where the time between events is drawn from an exponential distribution with rate equal to the sum of all dyadic intensities.
Usage
simulate_relational_events(
n_events,
senders,
receivers,
baseline_rate = 1,
start_time = 0,
horizon = Inf,
contribution_logits = NULL,
sender_covariates = NULL,
sender_effects = NULL,
receiver_covariates = NULL,
receiver_effects = NULL,
allow_loops = FALSE,
n_controls = 0,
endogenous_stats = NULL,
endogenous_effects = NULL,
global_covariates = NULL,
global_effects = NULL,
method = c("gillespie", "tau_leap"),
tau = NULL,
half_life = NULL,
risk = c("standard", "remove"),
wide = FALSE
)
Arguments
n_events |
Number of events to generate. |
senders |
Character vector listing the sender set |
receivers |
Character vector listing the receiver set |
baseline_rate |
Positive scalar. A constant baseline hazard multiplier applied to all dyads. Defaults to 1. |
start_time |
Initial time stamp. |
horizon |
Optional maximum horizon; simulation stops once the cumulative time would exceed this value. |
contribution_logits |
Optional |
sender_covariates |
Optional numeric data.frame/matrix with one row per sender. |
sender_effects |
Optional numeric vector of coefficients for
|
receiver_covariates |
Optional numeric data.frame/matrix with one row per receiver. |
receiver_effects |
Optional numeric vector of coefficients for
|
allow_loops |
Logical; whether sender and receiver can coincide. |
n_controls |
Integer; number of non-events (controls) to sample
uniformly at random for each realized event. If |
endogenous_stats |
Optional character vector of endogenous mechanisms to include in the rate. Each entry updates a state matrix after every event so the intensity of the next event depends on the realized history. Supported values:
Defaults to |
endogenous_effects |
Numeric vector of linear coefficients for
|
global_covariates |
Optional data.frame describing piecewise-constant
global covariates: variables whose value at time |
global_effects |
Numeric vector of linear coefficients for the global
covariates. May be named (names must match the covariate columns in
|
method |
Simulation algorithm. Either |
tau |
Positive scalar; the step size for |
half_life |
Positive scalar; the half-life |
risk |
Risk-set rule. |
wide |
Logical; when |
Details
When global_covariates is supplied, the simulator uses a
boundary-aware Gillespie scheme: the total event rate is rescaled by
\exp(\sum_k \beta_k\,x_k(t)); whenever a sampled waiting time would
cross an interval boundary, the clock is advanced to the boundary without
recording an event, and the next waiting time is redrawn under the new
global multiplier. Global covariates do not change the per-dyad selection
probabilities (the multiplier cancels), only the waiting-time
distribution. When combined with endogenous_stats, the
per-dyad rates are recomputed at every step from the current endogenous
state and then rescaled by the global multiplier.
The "tau_leap" algorithm advances the clock by a user-chosen step
\tau and draws, for every dyad, a \mathrm{Poisson}(\lambda_{sr}(t)\,\tau)
number of events using the rates at the start of the step. Multiple
events can fire in the same step; they are placed at uniform times within
[t, t+\tau) and reported in time order, but they share the
start-of-step endogenous state and global multiplier. Endogenous state is
updated once at the end of the step using all events in that step. The
tau-leap algorithm trades exactness for predictable, vectorised work per
step; it is most useful for high-rate regimes or for problems where the
per-event recomputation in the Gillespie path is the bottleneck. Choose
\tau small enough that (i) \lambda \tau \ll 1 on every active
dyad and (ii) \tau is smaller than the shortest interval in
global_covariates (within-step boundary crossings are not
resolved; the start-of-step global multiplier is used for the entire
step).
Value
If n_controls = 0, a data.frame with columns sender,
receiver and time. If n_controls > 0, it returns a
long-format data.frame with additional columns stratum (grouping an
event with its controls) and event (1 for the realized event,
0 for controls). When endogenous_stats is supplied, one extra column
per stat is appended carrying the value each row's dyad had at its event
time (immediately before the event fired), so downstream conditional
logistic / GAM estimators can recover the effects. When
global_covariates is supplied, one column per covariate is appended
carrying the value of that covariate at each row's event time. When
wide = TRUE (which requires n_controls = 1), this long
case-control output is reshaped to one row per event with columns
stratum, time, sender_ev, receiver_ev,
sender_nv, receiver_nv and, for each covariate,
<cov>_ev, <cov>_nv and d_<cov>.
Examples
set.seed(1)
senders <- receivers <- LETTERS[1:3]
sender_cov <- data.frame(activity = c(0.5, -0.2, 1.1))
receiver_cov <- data.frame(popularity = c(0.1, 0.3, -0.4))
# Standard event simulation
events <- simulate_relational_events(
n_events = 5,
senders = senders,
receivers = receivers,
sender_covariates = sender_cov,
sender_effects = 1,
receiver_covariates = receiver_cov,
receiver_effects = 2
)
events
# Case-control generation for partial likelihood inference
cc_events <- simulate_relational_events(
n_events = 5,
senders = senders,
receivers = receivers,
sender_covariates = sender_cov,
sender_effects = 1,
n_controls = 2
)
head(cc_events)
# Ready-made case-1-control (wide) dataset for degenerate logistic regression
wide_events <- simulate_relational_events(
n_events = 5,
senders = senders,
receivers = receivers,
n_controls = 1,
endogenous_stats = "reciprocity_count",
endogenous_effects = c(reciprocity_count = 0.6),
wide = TRUE
)
head(wide_events)
Actor attributes for the Social Evolution study
Description
Per-actor covariates for social_evolution_calls and social_evolution_friendship.
Usage
social_evolution_actors
Format
A data frame with 84 rows and 4 columns:
- id
Character actor id (
"Actor 1","Actor 2", …).- present
Logical — whether the actor was present at the start of the study window.
- floor
Integer dormitory floor.
- gradeType
Factor — student grade type (freshman, sophomore, junior, senior, graduate-tutor).
Source
Madan et al. (2011), via goldfish. See
social_evolution_calls.
Phone calls in the Social Evolution study (Madan et al. 2011)
Description
Time-stamped directed phone calls among undergraduates in an MIT
residence hall over the 2008–2009 academic year. Sourced from the
goldfish R package (Social_Evolution$calls).
Usage
social_evolution_calls
Format
A data frame with 439 rows and 4 columns:
- time
Days since the first recorded call.
attr(., "unix_origin")holds the Unix epoch oftime = 0.- sender
Character actor id matching social_evolution_actors
$id.- receiver
Same domain as
sender.- increment
Integer increment recorded for the call (typically 1).
Source
Madan, A., Cebrian, M., Moturu, S., Farrahi, K. (2011).
Sensing the "health state" of a community. IEEE Pervasive
Computing 11(1), 36–45. doi:10.1109/MPRV.2011.79. Redistributed
via the goldfish R package (github.com/snlab-ch/goldfish),
dataset Social_Evolution.
See Also
social_evolution_actors, social_evolution_friendship
Friendship-survey events for the Social Evolution study
Description
Self-reported friendship ties recorded at survey waves throughout the Social Evolution study.
Usage
social_evolution_friendship
Format
A data frame with 766 rows and 4 columns:
- time
Days since the first recorded call (same origin as social_evolution_calls).
attr(., "unix_origin")holds the Unix epoch oftime = 0.- sender
Character actor id (the survey respondent).
- receiver
Character actor id (the nominated friend).
- replace
Integer —
1adds the tie,0removes it.
Source
Madan et al. (2011), via goldfish.
See Also
social_evolution_calls, social_evolution_actors
Standardize a relational event log
Description
Module A focuses on preprocessing utilities. This helper normalizes user
supplied event logs into the canonical sender/receiver/time structure
expected elsewhere in the package. It also handles common cleaning tasks such
as sorting, dropping missing rows, and removing loops.
Usage
standardize_event_log(
event_log,
sender_col = "sender",
receiver_col = "receiver",
time_col = "time",
sort = TRUE,
drop_nas = TRUE,
drop_loops = FALSE,
strictly_increasing_time = FALSE,
remove_duplicates = TRUE,
keep_extra = TRUE
)
Arguments
event_log |
A data.frame (or tibble) containing at least one row per event. |
sender_col, receiver_col, time_col |
Column names storing the sender, receiver, and time information. |
sort |
Logical; should the output be sorted by time (ties are kept in input order)? |
drop_nas |
Logical; if |
drop_loops |
Logical; when |
strictly_increasing_time |
Logical; if |
remove_duplicates |
Logical; drop duplicated combinations of sender/receiver/time. |
keep_extra |
Logical; if |
Value
A data.frame with columns sender, receiver, and time. The return
object is tagged with class "amorem_event_log" for downstream dispatch.
Examples
data(classroom_events)
std <- standardize_event_log(classroom_events)
head(std)
Recency transform of inter-event time gaps
Description
Maps non-negative time gaps \delta to bounded recency
weights via
w(\delta) \;=\; \exp\!\Bigl(-\frac{\delta}{2\,m}\Bigr),
where m is the median of the supplied (or reference) gaps.
Large gaps map toward 0; gaps near 0 map toward 1. The half-life of
the kernel is 2 m \log 2, so the median gap
itself is mapped to approximately
e^{-1/2} \approx 0.607.
Usage
transform_recency(delta, half_life = NULL, reference = NULL)
Arguments
delta |
Numeric vector of non-negative time gaps. NAs propagate. |
half_life |
Optional positive scalar. If supplied, used directly
as the kernel scale |
reference |
Optional numeric vector. If supplied, the median is
computed on |
Details
This is the data-driven recency parametrisation used as a preprocessing
step for global and exogenous covariates in
Lembo, Juozaitiene, Vinciotti & Wit (2025) and matches the
"recency" axis of endogenous_features().
Value
Numeric vector the same length as delta, with values in
(0, 1]. NAs in delta are preserved.
References
Lembo M, Juozaitiene R, Vinciotti V, Wit EC (2025). Relational Event Models with Global Covariates. JRSS-C.
Examples
set.seed(1)
gaps <- rexp(20, rate = 0.5)
transform_recency(gaps)
transform_recency(gaps, half_life = 1)
Convert a long case-control event log to wide case-1-control format
Description
Reshapes a long case-(k-)control dataset – one row per case and per control,
with a 0/1 case indicator – into a wide case-1-control table with one row
per case. For each covariate the event value (<cov>_ev), the matched
control value (<cov>_nv) and their difference (d_<cov>, event minus
control) are emitted, ready for the gam backend of rem().
Usage
widen_case_control(
data,
case = NULL,
stratum = NULL,
covariates = NULL,
control_index = 1L,
keep_ids = TRUE
)
Arguments
data |
A long case-control data.frame. |
case |
Optional name of the 0/1 event-indicator column. If |
stratum |
Optional name of the column grouping each case with its
controls. When |
covariates |
Character vector of covariate columns to widen. When
|
control_index |
Which control within each stratum to pair with the case (default the first). Lets a case-k-control log be reduced to case-1-control. |
keep_ids |
Logical; when |
Details
This is the preprocessing companion to rem() for eventnet-style output,
where a case row is followed by its controls and the stratum id is left blank
on control rows.
Value
A data.frame with one row per case: a stratum column, the
sender/receiver identifiers (sender_ev/receiver_ev/sender_nv/
receiver_nv, when present in data and keep_ids = TRUE) and, for each
covariate, <cov>_ev, <cov>_nv and d_<cov>. Strata without exactly one
case or without the requested control are dropped (with a message).
See Also
rem(), simulate_relational_events() (wide = TRUE).
Examples
set.seed(1)
long <- data.frame(
IS_OBSERVED = rep(c(1, 0, 0), 4),
x = rnorm(12), y = rnorm(12))
widen_case_control(long, control_index = 1)