Mixed-Distribution Series Systems

The Motivation

Classical reliability texts assume all components in a series system share the same lifetime distribution: all exponential, all Weibull, or (in the homogeneous case) all Weibull with a shared shape. This simplification enables closed-form likelihood analysis — which is why the reference implementations shipped with maskedcauses cover these specific families.

Real systems rarely cooperate. A server’s failure modes include:

When you fit a single-family model to such a system, you’re forced to average over regimes that behave qualitatively differently. maskedhaz lets each component carry its own distribution.

A Worked Example: Medical Device with Four Failure Modes

Consider an implantable medical device with four independently-failing components:

Component Distribution Parameters Failure physics
Battery Weibull shape \(=3\), scale \(=10\) (years) Wear-out at end of rated life
Electronics Exponential rate \(= 0.02\) Random bit-flips / latch-ups
Lead wire Gompertz \(a=0.02\), \(b=0.3\) Fatigue cycling, accelerating
Sensor Log-logistic \(\alpha = 8\), \(\beta = 2\) Hump-shaped: peaks in mid-life, declines after
library(maskedhaz)

device <- dfr_series_md(components = list(
  battery    = dfr_weibull(shape = 3, scale = 10),
  electronics = dfr_exponential(0.02),
  lead_wire  = dfr_gompertz(a = 0.02, b = 0.3),
  sensor     = dfr_loglogistic(alpha = 8, beta = 2)
))
device
#> Masked-cause likelihood model (4-component series)
#>   Component 1: 2 param(s) [ 3, 10]
#>   Component 2: 1 param(s) [0.02]
#>   Component 3: 2 param(s) [0.02, 0.30]
#>   Component 4: 2 param(s) [8, 2]
#> Data columns: t (lifetime), omega (type), x * (candidates)

The total parameter count is \(2 + 1 + 2 + 2 = 7\). param_layout() shows which components own which positions:

param_layout(device$series)
#> [[1]]
#> [1] 1 2
#> 
#> [[2]]
#> [1] 3
#> 
#> [[3]]
#> [1] 4 5
#> 
#> [[4]]
#> [1] 6 7

Simulating Field Data

We simulate 500 devices observed for 8 years (right-censoring at tau = 8). Masking probability p = 0.3 means diagnostics include each non-failing component in the candidate set with 30% probability.

true_par <- c(3, 10,        # battery: shape, scale
              0.02,         # electronics: rate
              0.02, 0.3,    # lead wire: a, b
              8, 2)         # sensor: alpha, beta

set.seed(2026)
rdata_fn <- rdata(device)
df <- rdata_fn(theta = true_par, n = 500, tau = 8, p = 0.3)

table(df$omega)
#> 
#> exact right 
#>   437    63

About 87% of devices reach failure within the 8-year observation window, the rest are right-censored — a high exact-failure fraction because the combined hazard becomes substantial by the end of the window.

Fitting

The exp/Weibull reference implementations in maskedcauses don’t cover this mixture — no single family-specific likelihood does. But the protocol they implement (the series_md class, the C1–C2–C3 schema, the loglik/score/fit generics) applies uniformly. maskedhaz is another implementation under the same protocol: it handles arbitrary component combinations via component-wise hazard accumulation and numerical integration where needed (none here, since we have no left or interval censoring).

solver <- fit(device)

# Initialize reasonably far from the truth to exercise the optimizer
init <- c(2, 8, 0.03, 0.03, 0.2, 6, 1.5)
result <- solver(df, par = init)

Building the comparison table — coef() from stats, vcov() from stats, and the se() diagonal computation is what algebraic.mle::se() wraps:

comparison <- data.frame(
  parameter = c("battery shape", "battery scale",
                "electronics rate",
                "lead wire a", "lead wire b",
                "sensor alpha", "sensor beta"),
  truth    = true_par,
  estimate = round(coef(result), 4),
  se       = round(sqrt(diag(vcov(result))), 4)
)
comparison
#>          parameter truth estimate     se
#> 1    battery shape  3.00   3.1443 0.3069
#> 2    battery scale 10.00   9.3948 0.4318
#> 3 electronics rate  0.02   0.0250 0.0043
#> 4      lead wire a  0.02   0.0135 0.0033
#> 5      lead wire b  0.30   0.3613 0.0486
#> 6     sensor alpha  8.00   7.1420 0.3617
#> 7      sensor beta  2.00   2.4346 0.1893

All seven parameters are recovered within roughly one standard error of truth. Identifiability is comfortable here because the four components have qualitatively different hazard shapes — the system hazard \(h_{\text{sys}}(t) = \sum_j h_j(t)\) carries distinguishable contributions from each regime, and each regime dominates a different time window within the 8-year observation horizon.

Why Identifiability Works for Mixed Families

A fundamental constraint in masked-cause inference: when all components share a family, sometimes only aggregate parameters are identifiable from system-level data. The canonical example is exponential components — only \(\sum_j \lambda_j\) is identifiable, not the individual rates (see vignette("censoring-and-masking") for the details).

Mixed families break this degeneracy. If component 1 is Weibull (power-law growth) and component 2 is exponential (flat), the system hazard shape tells you which component dominates at which time regime. The likelihood surface separates them cleanly.

We can visualize this by looking at conditional cause probabilities — given a failure at time \(t\), which component was most likely at fault?

ccp_fn <- conditional_cause_probability(device)
t_grid <- seq(0.5, 15, length.out = 60)
probs <- ccp_fn(t_grid, par = true_par)

matplot(t_grid, probs, type = "l", lwd = 2,
        xlab = "system failure time (years)",
        ylab = "P(component j caused failure | T = t)",
        main = "Which component fails when?")
legend("topright",
       legend = c("battery", "electronics", "lead wire", "sensor"),
       col = 1:4, lty = 1:4, lwd = 2, bty = "n")

Three regimes are visible. In early life (\(t < 2\)), the log-logistic sensor and constant-rate electronics compete for dominance — both have small but comparable hazards while the Weibull and Gompertz components are still negligible. In mid-life (\(t \approx 2\)\(6\)), the Weibull battery rapidly takes over as its power-law hazard \(h_1(t) = 3(t/10)^2/10\) scales up. At late life (\(t > 10\)), the Gompertz lead wire competes with the battery — its accelerating hazard \(h_3(t) = 0.02 \cdot e^{0.3 t}\) catches up to the battery’s quadratic growth. The electronics has a flat contribution at rate 0.02 but its share declines because other components ramp up around it.

The fact that these regimes are temporally separated is exactly what makes the individual parameters identifiable from system-level data — each time window provides a different “view” of which component is contributing. Under a single-family assumption (say, “all Weibull”), the model would be forced to absorb all four physical modes into two aggregate parameters, losing both interpretability and the ability to predict which component will fail next.

Defining Your Own Component

The dfr_dist abstraction from flexhaz lets you define a component from nothing more than its hazard function. Suppose we want a bathtub-shaped failure mode with a well-chosen hazard \(h(t) = \alpha t^{-1/2} + \beta + \gamma t\):

library(flexhaz)

bathtub <- dfr_dist(
  rate      = function(t, par, ...) par[1]/sqrt(t) + par[2] + par[3]*t,
  cum_haz_rate = function(t, par, ...) 2*par[1]*sqrt(t) + par[2]*t + par[3]*t^2/2,
  n_par = 3L,
  par = c(0.1, 0.05, 0.01),
  par_lower = c(1e-6, 1e-6, 1e-6)
)

# Plug it into a series system like any other dfr_dist
mixed <- dfr_series_md(components = list(
  bathtub,
  dfr_exponential(0.02)
))

Because maskedhaz never assumes closed-form likelihoods, this works exactly like the built-in families. The log-likelihood, score, Hessian, and MLE are all computed by the same machinery — your custom hazard doesn’t need any special integration code.

Summary