| Type: | Package |
| Title: | A General Framework for Latent Classify and Profile Analysis |
| Version: | 1.0.0 |
| Date: | 2026-01-10 |
| Author: | Haijiang Qin |
| Maintainer: | Haijiang Qin <haijiang133@outlook.com> |
| Description: | A unified latent class modeling framework that encompasses both latent class analysis (LCA) and latent profile analysis (LPA), offering a one-stop solution for latent class modeling. It implements state-of-the-art parameter estimation methods, including the expectation–maximization (EM) algorithm, neural network estimation (NNE; requires users to have 'Python' and its dependent libraries installed on their computer), and integration with 'Mplus' (requires users to have 'Mplus' installed on their computer). In addition, it provides commonly used model fit indices such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC), as well as classification accuracy measures such as entropy. The package also includes fully functional likelihood ratio tests (LRT) and bootstrap likelihood ratio tests (BLRT) to facilitate model comparison, along with bootstrap-based and observed information matrix-based standard error estimation. Furthermore, it supports the standard three-step approach for LCA, LPA, and latent transition analysis (LTA) with covariates, enabling detailed covariate analysis. Finally, it includes several user-friendly auxiliary functions to enhance interactive usability. |
| License: | GPL-3 |
| Depends: | R (≥ 4.1.0) |
| Imports: | reticulate, clue, ggplot2, tidyr, dplyr, mvtnorm, Matrix, MASS, MplusAutomation, tidyselect, numDeriv, nloptr, Rcpp (≥ 1.0.0) |
| LinkingTo: | Rcpp, RcppArmadillo |
| RoxygenNote: | 7.3.3 |
| Encoding: | UTF-8 |
| NeedsCompilation: | yes |
| Collate: | 'adjust.response.R' 'check.response.R' 'compare.model.R' 'EM.LCA.R' 'EM.LPA.R' 'get.AvePP.R' 'get.CEP.R' 'get.entropy.R' 'get.fit.index.R' 'get.Log.Lik.LCA.R' 'get.Log.Lik.LPA.R' 'get.Log.Lik.LTA.R' 'get.npar.LCA.R' 'get.npar.LPA.R' 'get.npar.LTA.R' 'get.P.Z.Xn.LCA.R' 'get.P.Z.Xn.LPA.R' 'get.SE.R' 'install_python_dependencies.R' 'Kmeans.LCA.R' 'LCA.R' 'LCPA.R' 'logit.R' 'LPA.R' 'LTA.R' 'LRT.test.R' 'LRT.test.Bootstrap.R' 'LRT.test.VLMR.R' 'Mplus.LCA.R' 'Mplus.LPA.R' 'normalize.R' 'plotResponse.R' 'RcppExports.R' 'rdirichlet.R' 'S3extract.R' 'S3print.R' 'S3summary.R' 'S3update.R' 'sim.correlation.R' 'sim.LCA.R' 'sim.LPA.R' 'sim.LTA.R' 'tools.R' 'utils.R' 'logpdf_component.R' 'zzz.R' |
| Repository: | CRAN |
| Packaged: | 2026-01-17 09:40:24 UTC; Haiji |
| Date/Publication: | 2026-01-22 09:40:08 UTC |
Initialize LCA Parameters via K-means Clustering
Description
Performs hard clustering of observations using K-means algorithm to generate
initial parameter estimates for Latent Class Analysis (LCA) models. This
provides a data-driven initialization strategy that often outperforms random
starts when the number of observed categorical variables I is large
(i.e., I > 50).
Usage
Kmeans.LCA(response, L, nrep = 10)
Arguments
response |
A numeric matrix of dimension |
L |
Integer specifying the number of latent classes. Must be |
nrep |
Integer specifying the number of random starts for K-means algorithm (default: 10). The solution with the lowest within-cluster sum of squares is retained. |
Details
The function executes the following steps:
Data preprocessing: Automatically adjusts non-sequential category values to sequential integers (e.g., categories {1,3,5} become {1,2,3}) using internal adjustment routines.
K-means clustering: Scales variables to mean=0 and SD=1 before clustering. Uses Lloyd's algorithm with Euclidean distance.
Parameter estimation:
For each cluster
l, computes empirical response probabilitiesP(X_i=k|Z=l)for all itemsiand categoriesk.Handles singleton clusters by assigning near-deterministic probabilities (e.g.,
1-10^{-10}for observed category,10^{-10}for others).
Posterior probabilities: Constructs hard-classification matrix where
P(Z=l|\mathbf{X}_n)=1for the assigned cluster and 0 otherwise.
Value
A list containing:
paramsList of initialized parameters:
parAn
L \times I \times K_{\max}array of initial conditional probabilities, whereK_{\max}is the maximum number of categories across items. Dimension order: latent classes (1:L), items (1:I), response categories (1:K_max).P.ZNumeric vector of length
Lcontaining initial class prior probabilities derived from cluster proportions.
P.Z.XnAn
N \times Lmatrix of posterior class probabilities. Contains hard assignments (0/1 values) based on K-means cluster memberships.
Note
Requires at least one observation per cluster. If a cluster has only one observation, probabilities are set to avoid zero values (using
10^{-10}) for numerical stability.Data scaling is applied internally. Variables with zero variance are automatically excluded from clustering.
This function is primarily designed as an initialization method for
LCAand not intended for final model estimation.
Examples
# Simulate response data
set.seed(123)
response <- matrix(sample(1:4, 200, replace = TRUE), ncol = 5)
# Generate K-means initialization for 3-class LCA
init_params <- Kmeans.LCA(response, L = 3, nrep = 5)
# Inspect initial class probabilities
print(init_params$params$P.Z)
Fit Latent Class Analysis Models
Description
This function estimates parameters of a Latent Class Analysis (LCA; Hagenaars & McCutcheon, 2002) model using either the Expectation-Maximization (EM) algorithm or Neural Network Estimation (NNE). It supports flexible initialization strategies and provides comprehensive model diagnostics.
Usage
LCA(
response,
L = 2,
par.ini = "random",
method = "EM",
is.sort = TRUE,
nrep = 20,
starts = 100,
maxiter.wa = 20,
vis = TRUE,
control.EM = NULL,
control.Mplus = NULL,
control.NNE = NULL
)
Arguments
response |
A numeric matrix of dimension |
L |
Integer specifying the number of latent classes (default: 2). |
par.ini |
Specification for parameter initialization. Options include:
|
method |
Character string specifying estimation algorithm:
|
is.sort |
A logical value. If |
nrep |
Integer controlling replication behavior:
|
starts |
Number of random initializations to explore during warm-up phase (default: 100). |
maxiter.wa |
Maximum number of training iterations allowed per warm-up run. After completion,
the top |
vis |
Logical. If |
control.EM |
List of control parameters for EM algorithm:
|
control.Mplus |
List of control parameters for Mplus estimation:
|
control.NNE |
List of control parameters for NNE algorithm:
|
Value
An object of class "LCA" containing:
paramsList with estimated parameters:
parL \times I \times K_{\max}array of conditional response probabilities per latent class.P.ZVector of length
Lwith latent class prior probabilities.
nparNumber of free parameters in the model. see
get.npar.LCALog.LikLog-likelihood of the final model. see
get.Log.Lik.LCAAICAkaike Information Criterion value.
BICBayesian Information Criterion value.
best_BICBest BIC value across
nrepruns (if applicable).P.Z.XnN \times Lmatrix of posterior class probabilities for each observation.P.ZVector of length
Lcontaining the prior probabilities/structural parameters/proportions for each latent class.ZVector of length
Nwith MAP-classified latent class memberships.probabilitySame as
params$par(redundant storage for convenience).Log.Lik.historyVector tracking log-likelihood at each EM iteration.
Log.Lik.nrepVector of log-likelihoods from each replication run.
modelThe optimal neural network model object (only for
method="NNE"). Contains the trained transformer architecture corresponding tobest_loss. This object can be used for further predictions or model inspection.argumentsA list containing all input arguments
EM Algorithm
When method = "EM", parameters are estimated via the Expectation-Maximization algorithm, which iterates between:
-
E-step: Compute posterior class probabilities given current parameters:
P(Z_n = l \mid \mathbf{X}_n) = \frac{\pi_l \prod_{i=1}^I P(X_{ni} = x_{ni} \mid Z_n=l)}{\sum_{k=1}^L \pi_k \prod_{i=1}^I P(X_{ni} = x_{ni} \mid Z_n=k)}where
x_{ni}is the standardized (0-based) response for personnon itemi(seeadjust.response). -
M-step: Update parameters by maximizing expected complete-data log-likelihood:
Class probabilities:
\pi_l^{\text{new}} = \frac{1}{N} \sum_{n=1}^N P(Z_n = l \mid \mathbf{X}_n)Conditional probabilities:
P(X_i = k \mid Z=l)^{\text{new}} = \frac{\sum_{n:x_{ni}=k} P(Z_n = l \mid \mathbf{X}_n)}{\sum_{n=1}^N P(Z_n = l \mid \mathbf{X}_n)}
-
Convergence: Stops when
|\log\mathcal{L}^{(t)} - \log\mathcal{L}^{(t-1)}| < \texttt{tol}or maximum iterations reached.
Neural Network Estimation (NNE)
When method = "NNE", parameters are estimated using a hybrid neural network architecture
that combines feedforward layers with transformer-based attention mechanisms. This approach jointly
optimizes profile parameters and posterior probabilities through stochastic optimization enhanced
with simulated annealing. See install_python_dependencies. Key components include:
Architecture:
- Input Representation
-
Observed categorical responses are converted to 0-based integer indices per item (not one-hot encoded). For example, original responses
[1, 2, 4]become[0, 1, 2]. - Feature Estimator (Feedforward Network)
-
A fully-connected neural network with layer sizes specified by
hidden.layersand activation functionactivation.functionprocesses the integer-indexed responses. This network outputs unnormalized logits for posterior class membership (N \times Lmatrix). - Attention Refiner (Transformer Encoder)
-
A transformer encoder with
nheadattention heads that learns latent class prior probabilities\boldsymbol{\pi} = (\pi_1, \pi_2, \dots, \pi_L)directly from observed responses.Input:
responsematrix (N \times I), whereN= observations,I= continuous variables.Mechanism: Self-attention dynamically weighs variable importance during profile assignment, capturing complex multivariate interactions.
Output: Class prior vector
\boldsymbol{\pi}computed as the mean of posteriors:\pi_l = \frac{1}{N}\sum_{n=1}^N attention(\mathbf{X}_n)This ensures probabilistic consistency with the mixture model framework.
- Profile Parameter Estimation
-
Global conditional probability parameters (
P(X_i = k \mid Z = l)) are stored as learnable parameterspar(anL \times I \times K_{\max}tensor). A masked softmax is applied along categories to enforce:Probabilities sum to 1 within each item-class pair
Non-existent categories (beyond item's actual max response) are masked to zero probability
Optimization Strategy:
-
Hybrid Training Protocol: Alternates between:
-
Gradient-based phase: AdamW optimizer minimizes negative log-likelihood with weight decay regularization:
-\log \mathcal{L} + \lambda \|\boldsymbol{\theta}\|_2^2where
\lambdais controlled bylambda(default: 1e-5). Learning rate decays adaptively when loss plateaus (controlled byscheduler.patienceandscheduler.factor). -
Simulated annealing phase: After gradient-based early stopping (
maxiter.early), parameters are perturbed with noise scaled by temperature:\theta_{\text{new}} = \theta_{\text{current}} + \mathcal{N}(0, \theta_{\text{current}} \times \frac{T}{T_0})Temperature
Tdecays geometrically (T \leftarrow T \times \text{cooling.rate}) frominitial.temperatureuntilthreshold.sais reached. This escapes poor local minima.
Each full cycle (gradient descent + annealing) repeats up to
maxcycletimes. -
-
Model Selection: Across
nreprandom restarts (using Dirichlet-distributed initializations or K-means), the solution with lowest BIC is retained. -
Diagnostics: Training loss, annealing points, and global best solution are plotted when
vis=TRUE.
Mplus
When method = "Mplus", estimation is delegated to external Mplus software.
The function automates the entire workflow:
Workflow:
- Temporary Directory Setup
Creates
inst/Mplusto store:Mplus input syntax (
.inp)Data file in Mplus format (
.dat)Posterior probabilities output (
.dat)
Files are automatically deleted after estimation unless
control.Mplus$clean.files = FALSE.- Syntax Generation
Constructs Mplus syntax with:
-
CLASSES = c1(L)specification forLlatent classes -
CATEGORICALdeclaration for all indicator variables -
ANALYSISblock with optimization controls:TYPE = mixtureStandard mixture modeling setup
STARTS = starts nrepRandom
startsand final stage optimizationsSTITERATIONS = maxiter.wamax itertions during
starts.MITERATIONS = maxiterMaximum EM iterations
CONVERGENCE = tolLog-likelihood convergence tolerance
-
MODELblock with%OVERALL%
-
- Execution
Calls Mplus via
MplusAutomation::mplusModeler(), which:Converts R data to Mplus-compatible format with automatic recoding
Invokes Mplus executable (requires valid license and system PATH configuration)
References
Hagenaars, J. A. , & McCutcheon, A. L. (2002). Applied Latent Class Analysis. United Kingdom: Cambridge University Press.
McLachlan, G. J., & Peel, D. (2004). Finite Mixture Models. Wiley. https://books.google.com.sg/books?id=c2_fAox0DQoC
Examples
# Example with simulated data
set.seed(123)
data.obj <- sim.LCA(N = 500, I = 4, L = 2, IQ=0.9)
response <- data.obj$response
# Fit 2-class model with EM algorithm
fit.em <- LCA(response, L = 2, method = "EM", nrep = 10)
# Fit 2-profile model using Mplus
# need Mplus
# NOTE: 'files.path' in control.Mplus is REQUIRED — function will error if not provided.
# Example creates a timestamped subfolder (e.g., "Mplus_LCA_YYYY-MM-DD_HH-MM-SS") under './'
# to store all temporary Mplus files (.inp, .dat, .out, etc.).
## Not run:
fit.mplus <- LCA(response, L = 2, method = "Mplus", nrep = 3,
control.Mplus = list(files.path = ""))
## End(Not run)
# Fit 2-class model with neural network estimation
# need Python
## Not run:
fit.nne <- LCA(response, L = 2, method = "NNE", nrep = 3)
## End(Not run)
Latent Class/Profile Analysis with Covariates
Description
Implements the three-step estimation method (Vermunt, 2010; Liang et al., 2023) for latent class/profile analysis
with covariates, treating latent class membership as an observed variable with measurement error.
This is mathematically equivalent to a latent transition analysis (LTA) with times=1.
Usage
LCPA(
response,
L = 2,
ref.class = L,
type = "LCA",
covariate = NULL,
CEP.error = TRUE,
par.ini = "random",
params = NULL,
is.sort = TRUE,
constraint = "VV",
method = "EM",
tol = 1e-04,
method.SE = "Bootstrap",
n.Bootstrap = 100,
maxiter = 5000,
nrep = 20,
starts = 100,
maxiter.wa = 20,
vis = TRUE,
control.EM = NULL,
control.Mplus = NULL,
control.NNE = NULL
)
Arguments
response |
A matrix or data frame of observed responses.
Rows of the matrix represent individuals/participants/observations ( |
L |
Integer scalar. Number of latent classes/profiles. Must satisfy |
ref.class |
Integer |
type |
Character string. Specifies the type of latent variable model for Step 1:
|
covariate |
Optional. A matrix or data frame of covariates for modeling latent class membership.
Must include an intercept column (all 1s) as the first column.
If |
CEP.error |
Logical. If |
par.ini |
Specification for parameter initialization. Options include:
|
params |
Optional
|
is.sort |
A logical value. If |
constraint |
Character (LPA only). Specifies structure of within-class covariance matrices:
|
method |
Character. Estimation algorithm for Step 1 models:
|
tol |
Convergence tolerance for log-likelihood difference (default: 1e-4). |
method.SE |
Character. Method for estimating standard errors of parameter estimates:
Default is |
n.Bootstrap |
Integer. Number of bootstrap replicates used when |
maxiter |
Maximum number of iterations for optimizing the regression coefficients. Default: 5000. |
nrep |
Integer controlling replication behavior:
|
starts |
Number of random initializations to explore during warm-up phase (default: 100). |
maxiter.wa |
Maximum number of training iterations allowed per warm-up run. After completion,
the top |
vis |
Logical. If |
control.EM |
List of control parameters for EM algorithm:
|
control.Mplus |
List of control parameters for Mplus estimation:
|
control.NNE |
List of control parameters for NNE algorithm:
|
Value
An object of class LCPA, a named list containing:
betaMatrix of size
p \times L. Coefficients for class membership multinomial logit model. Columns 1 toL-1are free parameters; columnL(reference class) is constrained to\boldsymbol{\beta}_L = \mathbf{0}.beta.seStandard errors for
beta(if Hessian is invertible). Same dimensions asbeta. May containNAif variance-covariance matrix is not positive definite.beta.Z.staZ-statistics for testing null hypothesis that each beta coefficient equals zero. Computed as
beta / beta.se. Same structure asbeta.beta.p.value.tail1One-tailed p-values based on standard normal distribution:
P(Z < -|z|). Useful for directional hypotheses. Same structure asbeta.beta.p.value.tail2Two-tailed p-values:
2 \times P(Z < -|z|). Standard test for non-zero effect. Same structure asbeta.P.Z.XnMatrix of size
N \times Lof posterior class probabilitiesP(Z_n=l \mid \mathbf{X}_n)for each individualnand classl.P.ZVector of length
Lcontaining prior class proportionsP(Z = l)estimated at Step 1.ZVector of length
Ncontaining modal class assignments (MAP classifications)\hat{z}_nfor each individual.nparNumber of free parameters in the model (depends on covariates).
Log.LikObserved-data log-likelihood value at convergence.
Log.Lik.historyVector tracking log-likelihood at each iteration.
AICAkaike Information Criterion value.
BICBayesian Information Criterion value.
iterationsInteger. Number of optimization iterations in Step 3.
coveragedLogical.
TRUEif optimization terminated before reachingmaxiter(suggesting convergence). Note: This is a heuristic indicator; formal convergence diagnostics should check Hessian properties.paramsList. Step 1 model parameters (output from
LCA()orLPA()).callThe matched function call.
argumentsList of all input arguments passed to the function (useful for reproducibility).
Methodology Overview
The three-step procedure follows the same principles as LTA but for a single time point:
Step 1 — Unconditional Latent Class/Profile Model:
Fit an unconditional LCA or LPA model (ignoring covariates). Obtain posterior class membership probabilities
P(Z_n=l \mid \mathbf{X}_n) for each individual n and class l using Bayes' theorem.
Step 2 — Classification Error Probabilities (equal to get.CEP):
Compute the L \times L CEP matrix where element (k,l) estimates:
\text{CEP}(k,l) = P(\hat{Z}_n = l \mid Z_n = k)
using a non-parametric approximation:
\widehat{\text{CEP}}(k,l) = \frac{ \sum_{n=1}^N \mathbb{I}(\hat{z}_n = l) \cdot P(Z_n=k \mid \mathbf{X}_n) }{ \sum_{n=1}^N P(Z_n=k \mid \mathbf{X}_n) }
where \hat{z}_n is the modal class assignment.
Step 3 — Class Membership Model with Measurement Error Correction: Estimate the multinomial logit model for class membership:
P(Z_n = l \mid \mathbf{X}_n) = \frac{\exp(\boldsymbol{\beta}_l^\top \mathbf{X}_n)}{\sum_{k=1}^L \exp(\boldsymbol{\beta}_k^\top \mathbf{X}_n)}
where \mathbf{X}_n = (1, W_{n1}, \dots, W_{nM})^\top is the covariate vector for individual n
(with intercept as first column), and \boldsymbol{\beta}_l = (\beta_{l0}, \beta_{l1}, \dots, \beta_{lM})^\top
contains intercept and regression coefficients. Class L is the reference category
(\boldsymbol{\beta}_L = \mathbf{0}).
The observed-data likelihood integrates over latent classes:
\log \mathcal{L}(\boldsymbol{\beta}) =
\sum_{n=1}^N \log \left[
\sum_{l=1}^L \text{CEP}(l, \hat{z}_n) \cdot P(Z_n=l \mid \mathbf{X}_n)
\right]
Parameters \boldsymbol{\beta} are estimated via maximum likelihood using the BOBYQA algorithm.
Important Implementation Details
Reference Class: Coefficients for the reference class (
ref.class) are ALWAYS fixed to zero (\boldsymbol{\beta}_{ref.class} = \mathbf{0}) in the multinomial logit model.CEP Matrices: When
CEP.error = TRUE, misclassification probabilities are estimated non-parametrically using Step 1 posterior probabilities. This corrects for classification uncertainty. See inget.CEP.Covariate Requirements: Covariate matrix MUST include an intercept column (all 1s) as the first column. Dimensions must be
N \times (M+1), whereMrepresents the number of covariate and1is the Intercept.-
Optimization & Standard Errors:
Step 3 uses BOBYQA algorithm (
nloptr::nloptr) for stable optimization with box constraints.For
method.SE = "Obs": Standard errors derived from inverse Hessian (hessian). If Hessian is singular:Uses Moore-Penrose pseudoinverse (
ginv)Sets negative variances to
NA
For
method.SE = "Bootstrap": Each replicate independently re-estimates Steps 1-3. Failed bootstrap runs yieldNAin SEs and derived statistics. Progress messages include replicate index and optimization diagnostics.
-
Computational Notes:
Step 1 complexity increases with
LandI.Bootstrap is computationally intensive: 100 replicates = 100 full re-estimations of Steps 1-3.
-
Bootstrap Reproducibility: Always set a seed (e.g.,
set.seed(123)) before callingLCPA()when usingmethod.SE = "Bootstrap". Monitor convergence in bootstrap runs via progress messages.
References
Vermunt, J. K. (2010). Latent class modeling with covariates: Two improved three-step approaches. Political Analysis, 18(4), 450–469. https://doi.org/10.1093/pan/mpq025
Liang, Q., la Torre, J. d., & Law, N. (2023). Latent Transition Cognitive Diagnosis Model With Covariates: A Three-Step Approach. Journal of Educational and Behavioral Statistics, 48(6), 690-718. https://doi.org/10.3102/10769986231163320
Examples
library(LCPA)
set.seed(123)
N <- 2000 # Sample size
L <- 3 # Number of latent classes
I <- 6 # Number of items
# Create covariates (intercept + 2 covariates + 1 interaction)
Intercept = rep(1, N)
X1 <- rnorm(N)
X2 <- rbinom(N, 1, 0.5)
X1.X2 <- X1 * X2
covariate <- cbind(Intercept, X1, X2, X1.X2)
# Simulate data for LPA
sim_data <- sim.LTA(
N = N, I = I, L = L, times = 1, type = "LPA",
covariates = list(covariate), is.sort=TRUE,
beta = matrix(c(
-0.2, 0.0, -0.1, ## fix reference class to class 2
0.2, 0.0, -0.3,
0.8, 0.0, -0.6,
-0.1, 0.0, 0.3
), ncol = L, byrow = TRUE)
)
response <- sim_data$responses[[1]]
## It is strongly recommended to perform the following
## standardization to obtain more stable results when LPA.
## Standardization is not performed here in order to
## compare estimated values with true values.
# response <- normalize(response)
# Fit cross-sectional LPA with covariates
## fix reference class to class 2
# need Mplus
## Not run:
fit <- LCPA(
response = response,
L = L, ref.class = 2,
type = "LPA", is.sort=TRUE,
covariate = covariate,
method.SE = "Obs",
CEP.error = TRUE,
method = "Mplus",
control.Mplus = list(files.path = ""),
vis = TRUE
)
print(fit)
## End(Not run)
Fit Latent Profile Analysis
Description
This function estimates parameters of a Latent Profile Analysis (LPA) model for continuous observed variables using one of three methods: Expectation-Maximization (EM) algorithm, Neural Network Estimation (NNE), or external Mplus software. It supports flexible covariance structures and initialization strategies.
Usage
LPA(
response,
L = 2,
par.ini = "random",
constraint = "VV",
method = "EM",
is.sort = TRUE,
nrep = 20,
starts = 100,
maxiter.wa = 20,
vis = TRUE,
control.EM = NULL,
control.Mplus = NULL,
control.NNE = NULL
)
Arguments
response |
A numeric matrix of dimension |
L |
Integer specifying the number of latent profiles (default: 2). |
par.ini |
Specification for parameter initialization. Options include:
|
constraint |
Character string specifying covariance structure constraints:
|
method |
Character string specifying estimation algorithm:
|
is.sort |
A logical value. If |
nrep |
Integer controlling replication behavior:
|
starts |
Number of random initializations to explore during warm-up phase (default: 100). |
maxiter.wa |
Maximum number of training iterations allowed per warm-up run. After completion,
the top |
vis |
Logical. If |
control.EM |
List of control parameters for EM algorithm:
|
control.Mplus |
List of control parameters for Mplus estimation:
|
control.NNE |
List of control parameters for NNE algorithm:
|
Value
An object of class "LPA" containing:
paramsList with estimated profile parameters:
meansL \times Imatrix of estimated mean vectors for each profile.covsI \times I \times Larray of estimated covariance matrices for each profile.P.ZVector of length
Lwith profile prior probabilities.
nparNumber of free parameters in the model (depends on
constraint).Log.LikLog-likelihood of the final model.
AICAkaike Information Criterion value.
BICBayesian Information Criterion value.
best_BICBest BIC value across
nrepruns (if applicable).P.Z.XnN \times Lmatrix of posterior profile probabilities for each observation.P.ZVector of length
Lcontaining the prior probabilities/structural parameters/proportions for each latent class.ZVector of length
Nwith MAP-classified profile memberships.Log.Lik.historyVector tracking log-likelihood at each EM iteration (only for
method="EM").Log.Lik.nrepVector of log-likelihoods from each replication run.
modelThe optimal model object:
For
method="NNE": Trained neural network model.For
method="Mplus": Estimated Mplus model.
EM Algorithm
When method = "EM", parameter estimation uses the Expectation-Maximization (EM) algorithm to maximize the observed-data log-likelihood:
\log \mathcal{L} = \sum_{n=1}^N \log \left[ \sum_{l=1}^L \pi_l \cdot \mathcal{N}(\mathbf{x}_n \mid \boldsymbol{\mu}_l, \boldsymbol{\Sigma}_l) \right]
The algorithm iterates between two steps until convergence (change in log-likelihood < tol or max iterations reached):
- E-step:
-
Compute posterior class probabilities (responsibilities) for observation
nand classl:\tau_{nl} = \frac{\pi_l \cdot \mathcal{N}(\mathbf{x}_n \mid \boldsymbol{\mu}_l, \boldsymbol{\Sigma}_l)}{\sum_{k=1}^L \pi_k \cdot \mathcal{N}(\mathbf{x}_n \mid \boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k)}where
\mathcal{N}(\cdot)is the multivariate normal density,\pi_lis the prior class probability, and\boldsymbol{\mu}_l,\boldsymbol{\Sigma}_lare current parameters. Numerical stability is ensured via the log-sum-exp trick. - M-step:
-
Update parameters using responsibilities
\tau_{nl}:Class probabilities:
\pi_l^{\text{new}} = \frac{1}{N}\sum_{n=1}^N \tau_{nl}Class means:
\boldsymbol{\mu}_l^{\text{new}} = \frac{\sum_{n=1}^N \tau_{nl} \mathbf{x}_n}{\sum_{n=1}^N \tau_{nl}}Class covariances: Updated under constraints:
"VV"\boldsymbol{\Sigma}_l^{\text{new}} = \frac{\sum_{n=1}^N \tau_{nl} (\mathbf{x}_n - \boldsymbol{\mu}_l^{\text{new}}) (\mathbf{x}_n - \boldsymbol{\mu}_l^{\text{new}})^\top}{\sum_{n=1}^N \tau_{nl}}"EE"Shared covariance:
\boldsymbol{\Sigma}^{\text{new}} = \frac{\sum_{l=1}^L \sum_{n=1}^N \tau_{nl} (\mathbf{x}_n - \boldsymbol{\mu}_l^{\text{new}}) (\mathbf{x}_n - \boldsymbol{\mu}_l^{\text{new}})^\top}{\sum_{l=1}^L \sum_{n=1}^N \tau_{nl}}"VE"(default) /"EV"Hybrid constraints (e.g.,
"VE": varying variances, equal correlations). Off-diagonal elements use weighted averages across classes; diagonals retain class-specific values.- Custom constraints
User-specified variances/covariances (e.g.,
list(c(1,2), c(2, 2)), meaning the covariates of observed variable 1 and observed variable 2 are equal across latent classes, and the variance of observed variable 2 is equal across classes) are forced equal across classes via weighted averaging.
Covariance matrices are regularized to ensure positive definiteness:
Eigenvalues <
jitter(1e-10) are replaced withjitterFailed Cholesky decompositions trigger diagonal jittering or perturbation of non-constrained elements
Edge Handling:
Empty classes (
\sum_n \tau_{nl} < 10^{-5}) are reinitialized by redistributing responsibilities.Non-finite likelihoods trigger fallback to previous valid parameters or covariance perturbation.
Univariate cases (
I=1) bypass Cholesky decomposition for direct variance updates.
Neural Network Estimation (NNE)
When method = "NNE", parameters are estimated using a hybrid neural network architecture
combining fully-connected layers with transformer-based attention mechanisms. This approach jointly
optimizes profile parameters and posterior probabilities through stochastic optimization with
simulated annealing. See install_python_dependencies. Key components include:
Architecture:
- Input Representation:
-
Continuous observed variables
\mathbf{x}_n \in \mathbb{R}^Iare standardized (mean-centered, unit-variance) internally during training to improve numerical stability. No encoding is needed — raw values are fed directly. - Feature Encoder (Feedforward Network):
-
A multi-layer perceptron with architecture defined by
hidden.layersandactivation.functionmaps the continuous input vector into a latent space of dimensiond.model. This layer learns non-linear feature combinations predictive of latent profile membership. - Attention Refiner (Transformer Encoder)
-
A transformer encoder with
nheadattention heads that learns latent class prior probabilities\boldsymbol{\pi} = (\pi_1, \pi_2, \dots, \pi_L)directly from observed responses.Input:
responsematrix (N \times I), whereN= observations,I= continuous variables.Mechanism: Self-attention dynamically weighs variable importance during profile assignment, capturing complex multivariate interactions.
Output: Class prior vector
\boldsymbol{\pi}computed as the mean of posteriors:\pi_l = \frac{1}{N}\sum_{n=1}^N attention(\mathbf{X}_n)This ensures probabilistic consistency with the mixture model framework.
- Parameter Head (Means & Covariances):
-
Two separate projection heads branch from the transformer output:
Means Head: Linear projection to
L \times Imatrix\boldsymbol{\mu}_l.Covariance Head: Outputs lower-triangular elements of Cholesky factors
\mathbf{L}_lfor each profile. Diagonal elements are passed throughsoftplusto ensure positivity; off-diagonals usetanhscaled by 1.2 to bound magnitude and promote stability. The full covariance is reconstructed via\boldsymbol{\Sigma}_l = \mathbf{L}_l \mathbf{L}_l^\top.
After reconstruction, covariance constraints (e.g.,
"EE","V0", or custom lists) are applied by averaging constrained elements across profiles and re-symmetrizing.
Optimization Strategy:
-
Hybrid Training Protocol: Alternates between:
-
Gradient-based phase: AdamW optimizer minimizes negative log-likelihood with weight decay regularization:
-\log \mathcal{L} + \lambda \|\boldsymbol{\theta}\|_2^2where
\lambdais controlled bylambda(default: 1e-5). Learning rate decays adaptively when loss plateaus (controlled byscheduler.patienceandscheduler.factor). -
Simulated annealing phase: After gradient-based early stopping (
maxiter.early), parameters are perturbed with noise scaled by temperature:\theta_{\text{new}} = \theta_{\text{current}} + \mathcal{N}(0, \theta_{\text{current}} \times \frac{T}{T_0})Temperature
Tdecays geometrically (T \leftarrow T \times \text{cooling.rate}) frominitial.temperatureuntilthreshold.sais reached. This escapes poor local minima.
Each full cycle (gradient descent + annealing) repeats up to
maxcycletimes. -
-
Model Selection: Across
nreprandom restarts (using Dirichlet-distributed initializations or K-means), the solution with lowest BIC is retained. -
Diagnostics: Training loss, annealing points, and global best solution are plotted when
vis=TRUE.
Constraint Handling:
Covariance constraints (
constraint) are enforced after activation via:Shared Parameters: Variances/covariances marked for equality are replaced by their average across profiles.
Positive Definiteness: Non-positive definite matrices are corrected via eigenvalue clamping, diagonal jittering, or adaptive Cholesky decomposition.
Custom constraints: e.g.,
list(c(1,2), c(3,3)), force equality of specific covariance elements across profiles, with symmetry (\sigma_{12} = \sigma_{21}) automatically enforced.
Mplus
When method = "Mplus", estimation is delegated to external Mplus software.
The function automates the entire workflow:
Workflow:
- Temporary Directory Setup
Creates
inst/Mplusto store:Mplus input syntax (
.inp)Data file in Mplus format (
.dat)Posterior probabilities output (
.dat)
Files are automatically deleted after estimation unless
control.Mplus$clean.files = FALSE.- Syntax Generation
Constructs Mplus syntax with:
-
CLASSES = c1(L)specification forLlatent classes -
ANALYSISblock with optimization controls:TYPE = mixtureStandard mixture modeling setup
STARTS = starts nrepRandom
startsand final stage optimizationsSTITERATIONS = maxiter.wamax itertions during
starts.MITERATIONS = maxiterMaximum EM iterations
CONVERGENCE = tolLog-likelihood convergence tolerance
-
MODELblock reflecting the specifiedconstraintstructure
-
- Execution
Calls Mplus via
MplusAutomation::mplusModeler()
, which:
Writes data to disk in Mplus-compatible format
Invokes the Mplus executable (requires valid license)
Captures convergence status and output
Constraint Handling:
Covariance constraints (
constraint) are enforced after activation via:Shared Parameters: Variances/covariances marked for equality are replaced by their average across profiles.
Positive Definiteness: Non-positive definite matrices are corrected via eigenvalue clamping, diagonal jittering, or adaptive Cholesky decomposition.
Custom constraints: e.g.,
list(c(1,2), c(3,3)), force equality of specific covariance elements across profiles, with symmetry (\sigma_{12} = \sigma_{21}) automatically enforced.
References
McLachlan, G. J., & Peel, D. (2004). Finite Mixture Models. Wiley. https://books.google.com.sg/books?id=c2_fAox0DQoC
Examples
# Simulate bivariate continuous data for 2 profiles
set.seed(123)
data.obj <- sim.LPA(N = 500, I = 3, L = 2, constraint = "VV")
response <- data.obj$response
## It is strongly recommended to perform the following
## standardization to obtain more stable results.
## Standardization is not performed here in order to
## compare estimated values with true values.
# response <- normalize(response)
# Fit 2-profile model with VV constraint (default)
fit_vv <- LPA(response, L = 2, constraint = "VV")
# Fit 2-profile model with E0 constraint using neural network estimation
# need Python
## Not run:
fit_e0_nne <- LPA(response, L = 2, constraint = "E0", method = "NNE", nrep = 2)
## End(Not run)
# Fit 2-profile model using Mplus
# Requires Mplus to be installed and available in system PATH.
# NOTE: 'files.path' in control.Mplus is REQUIRED — the function will
# throw an error if not provided.
# This example creates a timestamped subdirectory
# (e.g., "Mplus_LPA_YYYY-MM-DD_HH-MM-SS") under './inst'
# to store all temporary Mplus files (.inp, .dat, .out, etc.).
# The 'inst' directory will be created if it does not exist.
# Setting files.clean=FALSE means temporary files will be preserved after execution.
## Not run:
fit_mplus <- LPA(response, L = 2, method = "Mplus", constraint = list(c(1, 2), c(3, 3)),
control.Mplus = list(files.path = "inst", files.clean=FALSE))
## End(Not run)
Likelihood Ratio Test
Description
Conducts a likelihood ratio test to compare the fit of two models. The test evaluates whether a model with more parameters provides a significantly better fit than a model with fewer parameters.
Usage
LRT.test(object1, object2)
Arguments
object1 |
Fitted model object with fewer parameters (i.e., fewer |
object2 |
Fitted model object with more parameters (i.e., more |
Details
Note that since the small model may be nested within the large model, the result
of LRT.test may not be accurate and is provided for reference only.
More reliable conclusions should be based on a combination of fit indices (i.e., get.fit.index),
classification accuracy measures (i.e., get.entropy, get.AvePP), and a bootstrapped
likelihood-ratio test (i.e., BLRT, LRT.test.Bootstrap, which is very time-consuming).
Above all and the most important criterion, is that the better model is the one that aligns with theoretical
expectations and offers clear interpretability.
The LRT.test test statistic is defined as:
The models must be nested (i.e., the model with fewer parameters is a constrained version of the more one).
Both models must be fit on the identical dataset with the same response variables.
The test statistic asymptotically follows a chi-square distribution.
LRT = -2 \times (\text{LogLik}_{1} - \text{LogLik}_{2})
where:
-
\text{LogLik}_{1}: Log-likelihood of the smaller model (fewer parameters). -
\text{LogLik}_{2}: Log-likelihood of the larger model (more parameters).
Under the null hypothesis (H_0: small model is true), LRT asymptotically follows
a chi-square distribution with df degrees of freedom.
Value
An object of class "htest" containing:
-
statistic: VLMR adjusted test statistic -
parameter: Degrees of freedom (df = npar_2 - npar_1) -
p.value: P-value from\chi^2_dfdistribution -
method: Name of the test -
data.name: Model comparison description
Bootstrap Likelihood Ratio Test
Description
Conducts a bootstrap likelihood ratio test to compare the fit of two nested models. This test evaluates whether a model with more parameters provides a significantly better fit than a model with fewer parameters by approximating the null distribution through parametric bootstrapping.
Usage
LRT.test.Bootstrap(object1, object2, n.Bootstrap = 100, vis = TRUE)
Arguments
object1 |
Fitted model object with fewer parameters (i.e., fewer |
object2 |
Fitted model object with more parameters (i.e., more |
n.Bootstrap |
Number of bootstrap replicates (default = 100). Higher values increase
accuracy but computation time (we suggest that |
vis |
Logical. If |
Details
Note that even the result of LRT.test.Bootstrap should not be taken as the sole
criterion; fit indices (e.g., get.fit.index) and classification accuracy
measures (e.g., get.entropy, get.AvePP) must be considered together.
Above all and the most important criterion, is that the better model is the one that aligns with theoretical
expectations and offers clear interpretability.
The LRT.test.Bootstrap statistic is calculated as:
LRT = -2 \times (\text{LogLik}_{1} - \text{LogLik}_{2})
where:
-
\text{LogLik}_{1}: Log-likelihood of the smaller model (fewer parameters). -
\text{LogLik}_{2}: Log-likelihood of the larger model (more parameters).
The LRT.test.Bootstrap function employs a parametric bootstrap procedure to
empirically estimate the distribution of the LRT statistic under the null hypothesis (that the
smaller model is sufficient). The specific steps are as follows:
Parameter Extraction: The estimated parameters (
params) from the smaller model (object1) are treated as the true population values (ground truth).Data Simulation: The function invokes
sim.LCAorsim.LPAto generaten.Bootstrapindependent datasets. Each dataset maintains the same sample size (N) and number of indicators (I) as the original empirical data.Model Re-fitting: For each simulated dataset, both the small model and the large model are re-fitted. To ensure consistency, the estimation settings (e.g., convergence criteria, iterations) are identical to those used for the original models, with the exception that
LRT.test.Bootstrapforcespar.ini = "random"to avoid local maxima.Distribution Generation: This process generates
n.Bootstrappairs of\text{LogLik}_{1, boot}and\text{LogLik}_{2, boot}, which are used to compute a collection of bootstrap LRT statistics:LRT_{boot} = -2 \times (\text{LogLik}_{1, boot} - \text{LogLik}_{2, boot}).P-value Calculation: The bootstrap
p-value is calculated as the proportion of simulatedLRT_{boot}values that are greater than or equal to the observedLRTstatistic from the original data.
This method is particularly recommended for Latent Class and Latent Profile Analysis because the traditional Chi-square distribution for LRT often fails to hold due to parameters being on the boundary of the parameter space (e.g., probabilities near 0 or 1).
Value
An object of class "htest" containing:
-
statistic: Observed likelihood ratio test statistic -
parameter: Degrees of freedom (reported asNAsince p-value is bootstrap-derived) -
p.value: Bootstrap p-value -
method: Name of the test ("Bootstrap Likelihood Ratio Test") -
data.name: Model comparison description
Lo-Mendell-Rubin likelihood ratio test
Description
Implements the ad-hoc adjusted likelihood ratio test (LRT) described in Formula 15 of Lo, Mendell,
& Rubin (2001), also known as VLMR LRT (Vuong-Lo-Mendell-Rubin Adjusted LRT). This method is typically
used to compare models with L-1 and L classes. If the difference in the number of
classes exceeds 1, conclusions should be interpreted with extreme caution.
Usage
LRT.test.VLMR(object1, object2)
Arguments
object1 |
Fitted model object with fewer parameters (i.e., fewer |
object2 |
Fitted model object with more parameters (i.e., more |
Details
Note that since the small model may be nested within the large model, the result
of LRT.test.VLMR may not be accurate and is provided for reference only.
More reliable conclusions should be based on a combination of fit indices (i.e., get.fit.index),
classification accuracy measures (i.e., get.entropy, get.AvePP), and a bootstrapped
likelihood-ratio test (i.e., BLRT, LRT.test.Bootstrap, which is very time-consuming).
Above all and the most important criterion, is that the better model is the one that aligns with theoretical
expectations and offers clear interpretability.
The LRT.test.VLMR statistic is defined as:
VLMR = \frac{LRT}{c} \quad \text{where} \quad c = 1 + \frac{1}{df \cdot \log(N)}
where:
-
LRTis the standard likelihood ratio statistic. seeLRT.test -
df = npar_2 - npar_1is the difference in number of free parameters between models. -
Nis the sample size.
Under the null hypothesis (H_0: small model is true), VLMR asymptotically follows
a chi-square distribution with df degrees of freedom.
Value
An object of class "htest" containing:
-
statistic: VLMR adjusted test statistic -
parameter: Degrees of freedom (df = npar_2 - npar_1) -
p.value: P-value from\chi^2_dfdistribution -
method: Name of the test -
data.name: Model comparison description
References
Lo, Y., Mendell, N. R., & Rubin, D. B. (2001). Testing the number of components in a normal mixture. Biometrika, 88(3), 767-778. https://doi.org/10.1093/biomet/88.3.767
Nylund-Gibson, K., & Choi, A. Y. (2018). Ten frequently asked questions about latent class analysis. Translational Issues in Psychological Science, 4(4), 440-461. https://doi.org/10.1037/tps0000176
Latent Transition Analysis (LTA)
Description
Implements the three-step estimation method (Vermunt, 2010; Liang et al., 2023) for Latent Transition Analysis (LTA), treating latent class memberships at each time point as observed variables with measurement error. Classification uncertainty from Step 1 (latent class/profile analysis) is explicitly incorporated into the transition model estimation in Step 3, ensuring asymptotically unbiased estimates of transition probabilities and covariate effects. This avoids the bias introduced by "hard" modal-class assignment.
Usage
LTA(
responses,
L = 2,
ref.class = L,
type = "LCA",
covariates = NULL,
CEP.timeCross = FALSE,
CEP.error = TRUE,
covariates.timeCross = FALSE,
par.ini = "random",
params = NULL,
is.sort = TRUE,
constraint = "VV",
method = "EM",
tol = 1e-04,
method.SE = "Bootstrap",
n.Bootstrap = 100,
maxiter = 5000,
nrep = 20,
starts = 100,
maxiter.wa = 20,
vis = TRUE,
control.EM = NULL,
control.Mplus = NULL,
control.NNE = NULL
)
Arguments
responses |
A |
L |
Integer scalar. Number of latent classes/profiles at each time point. Must satisfy |
ref.class |
Integer |
type |
Character string. Specifies the type of latent variable model for Step 1:
|
covariates |
Optional. A |
CEP.timeCross |
Logical. If |
CEP.error |
Logical. If |
covariates.timeCross |
Logical. If |
par.ini |
Specification for parameter initialization. Options include:
|
params |
Optional
|
is.sort |
A logical value. If |
constraint |
Character (LPA only). Specifies structure of within-class covariance matrices:
|
method |
Character. Estimation algorithm for Step 1 models:
|
tol |
Convergence tolerance for log-likelihood difference (default: 1e-4). |
method.SE |
Character. Method for estimating standard errors of parameter estimates:
Default is |
n.Bootstrap |
Integer. Number of bootstrap replicates used when |
maxiter |
Maximum number of iterations for optimizing the regression coefficients. Default: 5000. |
nrep |
Integer controlling replication behavior:
|
starts |
Number of random initializations to explore during warm-up phase (default: 100). |
maxiter.wa |
Maximum number of training iterations allowed per warm-up run. After completion,
the top |
vis |
Logical. If |
control.EM |
List of control parameters for EM algorithm:
|
control.Mplus |
List of control parameters for Mplus estimation:
|
control.NNE |
List of control parameters for NNE algorithm:
|
Value
An object of class LTA, a named list containing:
betaMatrix of size
p_1 \times L. Coefficients for initial class membership multinomial logit model. Columns 1 toL-1are free parameters; columnL(reference class) is constrained to\boldsymbol{\beta}_L = \mathbf{0}.gammaList of length
T-1. Each elementgamma[[t]](for transition from timettot+1) is a nested list:gamma[[t]][[from_class]][[to_class]]returns coefficient vector of lengthp_{t+1}. The last class (L) is reference → coefficients fixed to\boldsymbol{\gamma}_{kl,t+1} = \mathbf{0}for allk.beta.seStandard errors for
beta(if Hessian is invertible). Same dimensions asbeta. May containNAif variance-covariance matrix is not positive definite.gamma.seStandard errors for
gamma, same nested structure. May containNAs.beta.Z.staZ-statistics for testing null hypothesis that each beta coefficient equals zero. Computed as
beta / beta.se. Same structure asbeta.gamma.Z.staZ-statistics for gamma coefficients. Same nested structure as
gamma. Used for testing significance of transition effects.beta.p.value.tail1One-tailed p-values based on standard normal distribution:
P(Z < -|z|). Useful for directional hypotheses. Same structure asbeta.gamma.p.value.tail1One-tailed p-values for gamma coefficients. Same nested structure as
gamma.beta.p.value.tail2Two-tailed p-values:
2 \times P(Z < -|z|). Standard test for non-zero effect. Same structure asbeta.gamma.p.value.tail2Two-tailed p-values for gamma coefficients. Same nested structure as
gamma.P.Z.XnsList of length
T. Each element is anN \times Lmatrix of posterior class probabilitiesP(Z_{nt}=l \mid \mathbf{X}_{nt})for each individualnat timet.P.ZsList of length
T. Each element is a vector of lengthLcontaining prior class proportionsP(Z_t = l)estimated at Step 1 for timet.ZsList of length
T. Each element is a vector of lengthNcontaining modal class assignments (MAP classifications)\hat{z}_{nt}for each individual at timet.nparNumber of free parameters in the model (depends on
covariates).Log.LikObserved-data log-likelihood value at convergence.
Log.Lik.historyVector tracking log-likelihood at each iteration.
AICAkaike Information Criterion value.
BICBayesian Information Criterion value.
iterationsInteger. Number of optimization iterations in Step 3.
coveragedLogical.
TRUEif optimization terminated before reachingmaxiter(suggesting convergence). Note: This is a heuristic indicator; formal convergence diagnostics should check Hessian properties.paramsList. Step 1 model parameters (output from
LCA()orLPA()).callThe matched function call.
argumentsList of all input arguments passed to the function (useful for reproducibility).
Methodology Overview
The three-step LTA proceeds as follows:
Step 1 — Unconditional Latent Class/Profile Model:
At each time point t, fit an unconditional LCA or LPA model (ignoring transitions and covariates).
Obtain posterior class membership probabilities P(Z_{nt}=l \mid \mathbf{X}_{nt}) for each individual n
and class l using Bayes' theorem.
Step 2 — Classification Error Probabilities (equal to get.CEP):
Compute the L \times L CEP matrix for each time point t, where element (k,l) estimates:
\text{CEP}_t(k,l) = P(\hat{Z}_{nt} = l \mid Z_{nt} = k)
using a non-parametric approximation based on posterior weights:
\widehat{\text{CEP}}_t(k,l) = \frac{ \sum_{n=1}^N \mathbb{I}(\hat{z}_{nt} = l) \cdot P(Z_{nt}=k \mid \mathbf{X}_{nt}) }{ \sum_{n=1}^N P(Z_{nt}=k \mid \mathbf{X}_{nt}) }
where \hat{z}_{nt} is the modal (most likely) class assignment for individual n at time t.
Step 3 — Transition Model with Measurement Error Correction: Estimate the multinomial logit models for:
Initial class membership (time 1):
P(Z_{n1} = l \mid \mathbf{X}_{n1}) = \frac{\exp(\boldsymbol{\beta}_l^\top \mathbf{X}_{n1})}{\sum_{k=1}^L \exp(\boldsymbol{\beta}_k^\top \mathbf{X}_{n1})}Transitions (time
t > 1):P(Z_{nt} = l \mid Z_{n,t-1} = k, \mathbf{X}_{nt}) = \frac{\exp(\boldsymbol{\gamma}_{kl t}^\top \mathbf{X}_{nt})}{\sum_{j=1}^L \exp(\boldsymbol{\gamma}_{kj t}^\top \mathbf{X}_{nt})}
where \mathbf{X}_{n1} = (X_{n10}, X_{n11}, \dots, X_{n1M})^\top is the covariate vector for observation/participant n at time 1,
with X_{n10} = 1 (intercept term) and X_{n1m} (m=1,\dots,M) representing the value of the m-th covariate.
The coefficient vector \boldsymbol{\beta}_l = (\beta_{l0}, \beta_{l1}, \dots, \beta_{lM})^\top corresponds element-wise to \mathbf{X}_{n1},
where \beta_{l0} is the intercept and \beta_{lm} (m \geq 1) are regression coefficients for covariates.
Class L is the reference class (\boldsymbol{\beta}_L = \mathbf{0}).
\mathbf{X}_{nt} = (X_{nt0}, X_{nt1}, \dots, X_{ntM})^\top is the covariate vector at time t,
with X_{nt0} = 1 (intercept) and X_{ntm} (m=1,\dots,M) as the m-th covariate value.
The coefficient vector \boldsymbol{\gamma}_{lkt} = (\gamma_{lkt0}, \gamma_{lkt1}, \dots, \gamma_{lktM})^\top
corresponds element-wise to \mathbf{X}_{nt}, where \gamma_{lkt0} is the intercept and \gamma_{lktm} (m \geq 1)
are regression coefficients. Class L is the reference class (\boldsymbol{\gamma}_{lLt} = \mathbf{0} for all l).
The full observed-data likelihood integrates over all possible latent class paths \mathbf{z}_n = (z_{n1},\dots,z_{nT}):
\begin{aligned}
\log \mathcal{L}(\boldsymbol{\theta}) &=
\sum_{n=1}^N \log \Biggl[
\sum_{\mathbf{z}_n \in \{1,\dots,L\}^T}
\Bigl( \prod_{t=1}^T \text{CEP}_t(z_{nt}, \hat{z}_{nt}) \Bigr) \cdot \\
&\quad P(Z_{n1}=z_{n1} \mid \mathbf{X}_{n1}) \cdot \\
&\quad \prod_{t=2}^T P(Z_{nt}=z_{nt} \mid Z_{n,t-1}=z_{n,t-1}, \mathbf{X}_{nt})
\Biggr]
\end{aligned}
Parameters \boldsymbol{\theta} = \{\boldsymbol{\beta}, \boldsymbol{\gamma}\} are estimated via maximum likelihood using
the BOBYQA algorithm (box-constrained derivative-free optimization). Reference class L satisfies \boldsymbol{\beta}_L = \mathbf{0} and \boldsymbol{\gamma}_{kl t} = \mathbf{0} for all k when l = L.
Bootstrap Standard Error Estimation
When method.SE = "Bootstrap", standard errors are estimated using a nonparametric bootstrap procedure:
Draw
B(=n.Bootstrap) independent samples of sizeNwith replacement from the original data.For each bootstrap sample
b=1,\dots,B, re-estimate the full three-step LTA model (Steps 1–3), yielding parameter vector\hat{\boldsymbol{\theta}}^{(b)}.Compute the bootstrap standard error for each parameter as the sample standard deviation across replicates:
\widehat{\mathrm{SE}}_{\mathrm{boot}}(\hat{\theta}_j) = \sqrt{ \frac{1}{B-1} \sum_{b=1}^B \left( \hat{\theta}_j^{(b)} - \bar{\theta}_j \right)^2 },where
\bar{\theta}_j = \frac{1}{B}\sum_{b=1}^B \hat{\theta}_j^{(b)}.
This approach does not rely on large-sample normality or correct specification of the information matrix,
making it particularly suitable for complex models like LTA where analytic derivatives are difficult or unstable.
However, it increases computational cost linearly with B.
Important Implementation Details
Reference Class: The last latent class (
L) is always treated as the reference category. All corresponding coefficients inbetaandgammaare fixed to zero (\boldsymbol{\beta}_L = \mathbf{0},\boldsymbol{\gamma}_{kl t} = \mathbf{0}forl=L).CEP Matrices: When
CEP.error = TRUE, misclassification probabilities are estimated non-parametrically using Step 1 posterior probabilities. This corrects for classification uncertainty. SettingCEP.timeCross = TRUEassumes these error structures are identical across time (measurement invariance). See inget.CEP.Covariate Handling: Covariates for initial status (time 1) and transitions (time
t \geq 2) can differ. For transitions to timet, the covariate matrix must have dimensionsN \times (M_{t}+1), i.e., an intercept column of all1plusM_{t}columns of covariates in timet.Optimization: Step 3 uses L-BFGS-B via
nloptrto ensure numerical stability. Standard errors are derived from the inverse Hessian (viahessian). If singular, Moore-Penrose pseudoinverse (ginv) is used, and negative variances are set toNA.Computational Complexity: Likelihood evaluation requires enumerating
L^Tpossible latent paths.Bootstrap Computation: Each bootstrap iteration re-fits Steps 1–3 independently, including re-estimation of
P.Z.Xns,CEPmatrices and transition parameters. To ensure reproducibility, set a seed before callingLTA()when usingmethod.SE = "Bootstrap". Progress messages during bootstrapping include current replicate index and optimization diagnostics. Users should monitor convergence in each bootstrap run; failed runs will result inNAentries in SEs and derived statistics.
References
Liang, Q., la Torre, J. d., & Law, N. (2023). Latent Transition Cognitive Diagnosis Model With Covariates: A Three-Step Approach. Journal of Educational and Behavioral Statistics, 48(6), 690-718. https://doi.org/10.3102/10769986231163320
McLachlan, G. J., & Peel, D. (2004). Finite Mixture Models. Wiley. https://books.google.com.sg/books?id=c2_fAox0DQoC
Vermunt, J. K. (2010). Latent class modeling with covariates: Two improved three-step approaches. Political Analysis, 18(4), 450–469. https://doi.org/10.1093/pan/mpq025
Examples
library(LCPA)
set.seed(123)
N <- 2000 ## sample size
L <- 3 ## number of latent class
I <- 6 ## number of variables/items
## Covariates at time point T1
covariates.inter <- rep(1, N) # Intercept term is always 1 for each individual
covariates.X11 <- rnorm(N) # Covariate X1 is a continuous variable
# Combine into covariates at T1
covariates.T1 <- cbind(Intercept=covariates.inter, X1=covariates.X11)
## Covariates at time point T2
covariates.inter <- rep(1, N) # Intercept term is always 1 for each individual
covariates.X21 <- rnorm(N) # Covariate X1 is a continuous variable
# Combine into covariates at T1
covariates.T2 <- cbind(Intercept=covariates.inter, X1=covariates.X21)
# Combine into final covariates list
covariates <- list(t1=covariates.T1, t2=covariates.T2)
## Simulate beta coefficients
# last column is zero because the last category is used as reference
## fix reference class to class 2
beta <- matrix(c( 0.8, 0.0, 0.1,
-0.3, 0.0, -0.5), ncol=L, byrow=TRUE)
## Simulate gamma coefficients
gamma <- list(
lapply(1:L, function(l) {
lapply(1:L, function(k) if(k < L)
runif(2, -2.0, 2.0) else c(0, 0)) # Last class as reference
})
)
## Simulate the data
sim_custom <- sim.LTA(
N=N, I=I, L=L, times=2, type="LCA", IQ=0.9,
covariates=covariates,
beta=beta,
gamma=gamma
)
summary(sim_custom)
responses <- sim_custom$responses
covariates <- sim_custom$covariates
## fix reference class to class 2
LTA.obj <- LTA(responses, L=L, ref.class=2, type="LCA",
covariates=covariates,
method.SE="Bootstrap", n.Bootstrap=10,
CEP.timeCross=FALSE, CEP.error=TRUE, covariates.timeCross=FALSE,
par.ini = "random", method="EM", vis = TRUE)
print(LTA.obj)
Adjust Categorical Response Data for Polytomous Items
Description
Standardizes polytomous response data by converting raw category values to consecutive integers starting from 0. Records original category values for potential reverse transformation. Handles varying numbers of response categories across items.
Usage
adjust.response(response)
Arguments
response |
A matrix or data frame containing response data where:
Non-numeric columns will be coerced to numeric with warning. |
Details
The function processes each item column independently:
Extracts unique response values and sorts them in ascending order
Maps smallest value to 0, second smallest to 1, etc.
Records original values in
poly.origfor possible reverse transformationHandles items with different numbers of categories through NA-padding
Missing values (NA) in input are preserved as NA in output.
Value
A named list containing:
poly.origI \times K_{max}matrix. Original sorted category values for each item. Rows correspond to items, columns to category positions. Empty cells filled withNA.poly.valueInteger vector of length
I. Number of unique response categories per item.poly.maxScalar integer. Maximum number of categories across all items, i.e.,
K_{max}.responseN \times Imatrix. Adjusted response data where original values are replaced by zero-based category indices (0 tok-1forkcategories).
Examples
# Simulate response data with 3 items and varying categories
set.seed(123)
resp <- data.frame(
item1 = sample(1:3, 10, replace = TRUE),
item2 = sample(c(0, 5, 10), 10, replace = TRUE),
item3 = sample(1:2, 10, replace = TRUE)
)
# Apply adjustment
adjusted <- adjust.response(resp)
# Inspect results
str(adjusted)
print(adjusted$poly.orig) # Original category values
print(adjusted$response) # Standardized responses
Validate response matrix against expected polytomous category counts
Description
Checks whether each column in the response matrix contains exactly the number
of unique response categories specified in poly.value. Handles edge cases
where all items have identical category counts efficiently.
Usage
check.response(response, poly.value)
Arguments
response |
A numeric matrix of dimension
Each cell contains the observed response value for a subject on an item. |
poly.value |
An integer vector of length |
Value
Logical value indicating validation status:
-
TRUEif either:All columns have identical numbers of unique values (regardless of
poly.valuespecification)Each column's unique value count matches its corresponding
poly.valueentry
-
FALSEif any column's unique value count mismatches its specifiedpoly.value(when columns have varying category counts)
Note
This function contains a specific behavior: When all items have identical numbers of
unique response categories, it returns TRUE immediately without validating against
poly.value. This may lead to unexpected results if poly.value contains
inconsistent expectations. Users should ensure poly.value accurately reflects
their measurement model.
Examples
# Valid case: Matching category counts
resp_matrix <- matrix(c(1,1,2,2, 1,2,3,1), ncol = 2)
check.response(resp_matrix, poly.value = c(2, 3)) # Returns TRUE
# Invalid case: Mismatched category counts
check.response(resp_matrix, poly.value = c(2, 2)) # Returns FALSE
# Special case: Uniform category counts bypass poly.value check
uniform_resp <- matrix(rep(1:2, each = 4), ncol = 2)
check.response(uniform_resp, poly.value = c(2, 5)) # Returns TRUE (bypass behavior)
Model Comparison Tool
Description
Compares two nested latent class/profile models using multiple fit indices, likelihood ratio tests, and classification metrics.
Usage
compare.model(object1, object2, n.Bootstrap = 0)
Arguments
object1 |
An object of class |
object2 |
An object of class |
n.Bootstrap |
Integer specifying the number of bootstrap replications for the parametric
bootstrap likelihood ratio test (BLRT). Default is |
Details
This function performs comprehensive model comparison between two nested LCA/LPA models. Key features include:
Automatically orders models by parameter count (smaller model first)
Computes multiple fit indices via
get.fit.indexCalculates classification quality metrics (entropy, average posterior probabilities)
Performs three types of likelihood ratio tests:
Standard LRT, see
LRT.testVLMR adjusted LRT, see
LRT.test.VLMRParametric bootstrap LRT (computationally intensive but robust), see
LRT.test.Bootstrap
Computes Bayes Factor using Sample-Size Adjusted BIC (SIC)
Important Requirements:
Both models must be of the same type (
LCAorLPA)Models must be nested (one model is a constrained version of the other)
-
n.Bootstrap > 0requires significant computational resources
Value
An object of class compare.model containing:
nparNamed vector with number of free parameters for each model
entropyNamed vector with entropy values (classification accuracy measure) for each model
AvePPList containing average posterior probabilities per latent class/profile
fit.indexList of
get.fit.indexobjects for both modelsBFBayes Factor for model comparison (based on SIC)
LRT.objLikelihood ratio test (LRT) results
LRT.VLMR.objVuong-Lo-Mendell-Rubin (VLMR) adjusted LRT results
LRT.Bootstrap.objBootstrap LRT results (if
n.Bootstrap > 0)callThe matched function call
argumentsList containing the original arguments passed to the function
See Also
LCA, LPA, get.fit.index,
extract, LRT.test, LRT.test.VLMR
Examples
library(LCPA)
set.seed(123)
data.obj <- sim.LPA(N = 500, I = 5, L = 3, constraint = "V0")
response <- data.obj$response
# need Mplus
## Not run:
# Compare 3-class vs 4-class LPA models
object1 <- LPA(response, L = 3, method = "Mplus", constraint = "V0")
object2 <- LPA(response, L = 4, method = "Mplus", constraint = "V0")
compare.model.obj <- compare.model(object1, object2)
print(compare.model.obj)
## End(Not run)
S3 Methods: extract
Description
A generic S3 extractor function designed to retrieve internal components from various model and simulation objects
produced by the LCPA package. This function provides a consistent interface across different classes,
allowing users to access estimated parameters, fit statistics, simulation truths, standard errors, and more.
Usage
extract(object, what, ...)
## S3 method for class 'LCA'
extract(object, what, ...)
## S3 method for class 'LPA'
extract(object, what, ...)
## S3 method for class 'LCPA'
extract(object, what, ...)
## S3 method for class 'LTA'
extract(object, what, ...)
## S3 method for class 'sim.LCA'
extract(object, what, ...)
## S3 method for class 'sim.LPA'
extract(object, what, ...)
## S3 method for class 'sim.LTA'
extract(object, what, ...)
## S3 method for class 'fit.index'
extract(object, what, ...)
## S3 method for class 'compare.model'
extract(object, what, ...)
## S3 method for class 'SE'
extract(object, what, ...)
Arguments
object |
An object of one of the following classes:
|
what |
A character string specifying the name of the component to extract.
Valid choices depend on the class of |
... |
Additional arguments passed to methods (currently ignored). |
Details
This function supports extraction from ten primary object classes. Below are available components for each:
LCALatent Class Analysis model results. Available components:
paramsList containing all estimated model parameters.
par3D array (
L \times I \times K_{\max}) of conditional response probabilities.P.ZVector of length
Lwith latent class prior probabilities.nparNumber of free parameters in the model.
Log.LikLog-likelihood of the final model.
AICAkaike Information Criterion.
BICBayesian Information Criterion.
best_BICBest BIC value across replication runs (if
nrep > 1).P.Z.XnN \times Lmatrix of posterior class probabilities.ZVector of length
Nwith MAP-classified latent class memberships.probabilityList of formatted conditional probability matrices per item.
Log.Lik.historyVector tracking log-likelihood at each EM iteration.
Log.Lik.nrepVector of log-likelihoods from each replication run.
modelTrained neural network model object (only when
method="NNE").callThe original function call used for model estimation.
argumentsList containing all input arguments passed to the
LCAfunction.
LPALatent Profile Analysis model results. Available components:
paramsList containing all estimated model parameters.
meansL \times Imatrix of estimated mean vectors for each profile.covsI \times I \times Larray of estimated covariance matrices.P.ZVector of length
Lwith profile prior probabilities.nparNumber of free parameters (depends on
constraint).Log.LikLog-likelihood of the final model.
AICAkaike Information Criterion.
BICBayesian Information Criterion.
best_BICBest BIC value across replication runs (if
nrep > 1).P.Z.XnN \times Lmatrix of posterior profile probabilities.ZVector of length
Nwith MAP-classified profile memberships.Log.Lik.historyVector tracking log-likelihood at each EM iteration.
Log.Lik.nrepVector of log-likelihoods from each replication run.
modelTrained model object (neural network or Mplus).
callThe original function call used for model estimation.
argumentsList containing all input arguments passed to the
LPAfunction.constraintCovariance structure constraints applied during estimation (from original arguments).
LCPALatent Class/Profile Analysis (with covariates). Available components:
betaInitial class coefficients (p1 x L matrix).
beta.seStandard errors for beta.
beta.Z.staZ-statistics for beta.
beta.p.value.tail1One-tailed p-values for beta.
beta.p.value.tail2Two-tailed p-values for beta.
P.Z.XnPosterior probabilities (N x L).
P.ZPrior proportions (length L).
ZModal class assignments (length N).
nparNumber of free parameters.
Log.LikLog-likelihood.
AICAIC.
BICBIC.
iterationsOptimization iterations in Step 3.
coveragedLogical: did optimization converge early?
paramsStep 1 model parameters (LCA/LPA output).
callFunction call.
argumentsInput arguments list.
LTALatent Transition Analysis model results. Available components:
betaInitial class coefficients (p1 x L matrix).
gammaTransition coefficients (nested list).
beta.seStandard errors for beta.
gamma.seStandard errors for gamma.
beta.Z.staZ-statistics for beta.
gamma.Z.staZ-statistics for gamma.
beta.p.value.tail1One-tailed p-values for beta.
gamma.p.value.tail1One-tailed p-values for gamma.
beta.p.value.tail2Two-tailed p-values for beta.
gamma.p.value.tail2Two-tailed p-values for gamma.
P.Z.XnsList of posterior probabilities per time (each N x L).
P.ZsList of prior proportions per time (each length L).
ZsList of modal class assignments per time (each length N).
nparNumber of free parameters.
Log.LikLog-likelihood.
AICAIC.
BICBIC.
iterationsOptimization iterations in Step 3.
coveragedLogical: did optimization converge early?
paramsStep 1 model parameters (LCA/LPA output).
callFunction call.
argumentsInput arguments list.
sim.LCASimulated Latent Class Analysis data. Available components:
responseInteger matrix (
N \times I) of simulated categorical observations.parArray (
L \times I \times P_{\max}) of true class-specific category probabilities.ZInteger vector (length
N) of true latent class assignments.P.ZNumeric vector (length
L) of true class proportions.poly.valueInteger vector (length
I) specifying categories per variable.P.Z.XnBinary matrix (
N \times L) of true class membership indicators.callThe original function call used for simulation.
argumentsList containing all input arguments passed to
sim.LCA.
sim.LPASimulated Latent Profile Analysis data. Available components:
responseNumeric matrix (
N \times I) of simulated continuous observations.meansL \times Imatrix of true class-specific means.covsI \times I \times Larray of true class-specific covariance matrices.P.Z.XnN \times Lmatrix of true class membership indicators.P.ZNumeric vector (length
L) of true class proportions.ZInteger vector (length
N) of true profile assignments.constraintOriginal constraint specification passed to
sim.LPA.callThe original function call used for simulation.
argumentsList containing all input arguments passed to
sim.LPA.
sim.LTASimulated Latent Transition Analysis data. Available components:
responsesList of response matrices per time point.
ZsList of true latent class assignments per time.
P.ZsList of true class proportions per time.
parTrue conditional probabilities (for categorical items).
meansTrue profile means (for continuous variables).
covsTrue covariance matrices per class and time.
poly.valueCategories per variable (for categorical items).
rateTransition rate matrix or structure.
covariatesSimulated covariate matrix.
betaTrue initial class coefficients.
gammaTrue transition coefficients.
callOriginal simulation function call.
argumentsInput arguments used in simulation.
fit.indexModel fit indices object. Available components:
nparNumber of free parameters in the model.
Log.LikLog-likelihood of the model.
-2LLDeviance statistic (-2 times log-likelihood).
AICAkaike Information Criterion.
BICBayesian Information Criterion.
SICSample-Size Adjusted BIC (-0.5 * BIC).
CAICConsistent AIC.
AWEApproximate Weight of Evidence.
SABICSample-Size Adjusted BIC (alternative formulation).
callOriginal function call that generated the fit indices.
argumentsList containing input arguments (includes original model object).
compare.modelModel comparison results. Available components:
nparNamed numeric vector with free parameters for each model (
model1,model2).entropyNamed numeric vector with entropy values for each model.
AvePPList of average posterior probabilities per class/profile for each model.
fit.indexList of
get.fit.indexobjects for both models.BFBayes Factor comparing models (based on SIC differences).
LRT.objStandard likelihood ratio test results (requires nested models).
LRT.VLMR.objVuong-Lo-Mendell-Rubin adjusted likelihood ratio test results.
LRT.Bootstrap.objParametric bootstrap likelihood ratio test results (if
n.Bootstrap > 0).callThe matched function call used for comparison.
argumentsList containing original input arguments (
object1,object2,n.Bootstrap).
SEStandard error estimation results. Available components:
seList containing standard errors for parameters (components depend on model type).
vcovVariance-covariance matrix (only for
method="Obs").hessianObserved information matrix (only for
method="Obs").diagnosticsMethod-specific diagnostic information (e.g., estimation method).
callFunction call that generated the object.
argumentsInput arguments used in estimation.
meansStandard errors for profile means (LPA models only — accessed via
selist).covsStandard errors for covariance parameters (LPA models only — accessed via
selist).P.ZStandard errors for class proportions (both LCA/LPA — accessed via
selist).parStandard errors for conditional probabilities (LCA models only — accessed via
selist).
Value
The requested component. Return type varies depending on what and the class of object.
If an invalid what is provided, an informative error is thrown listing valid options.
Methods (by class)
-
extract(LCA): Extract fields from aLCAobject -
extract(LPA): Extract fields from aLPAobject -
extract(LCPA): Extract fields from aLCPAobject -
extract(LTA): Extract fields from aLTAobject -
extract(sim.LCA): Extract fields from asim.LCAobject -
extract(sim.LPA): Extract fields from asim.LPAobject -
extract(sim.LTA): Extract fields from asim.LTAobject -
extract(fit.index): Extractor method forfit.indexobjects -
extract(compare.model): Extract fields from acompare.modelobject -
extract(SE): Extract fields from aSEobject
Usage Notes
For
LCA,LPA,LCPA, andLTAobjects, components reflect estimated parameters.For
sim.LCA,sim.LPA, andsim.LTAobjects, components reflect true data-generating parameters.In
SEobjects:Top-level components like
vcovandhessianare only available whenmethod = "Obs". Requesting them underBootstraptriggers a warning and returnsNULL.Parameter-specific SEs (e.g.,
means,par) are stored within theselist. You can extract them directly by name (e.g.,extract(se_obj, "means")).Attempting to extract unavailable parameter SEs (e.g.,
parfrom an LPA model) triggers an error with available options.
For
fit.indexandcompare.modelobjects, valid components are dynamically determined from the object’s names.All methods ignore additional arguments (
...).
Examples
set.seed(123)
# Simulate LPA data: 500 observations, 3 continuous variables, 2 latent profiles
# Constraint "E0": Equal variances across classes, zero covariances
data.obj <- sim.LPA(N = 500, I = 3, L = 2, constraint = "E0")
# Extract the simulated response matrix (N x I) for model fitting
response <- extract(data.obj, "response")
# Extract the TRUE covariance matrices (I x I x L array)
extract(data.obj, "covs")
# Fit an LPA model to the simulated data using the SAME constraint ("E0")
fit_E0 <- LPA(response, L = 2, constraint = "E0")
# Extract the ESTIMATED covariance matrices from the fitted model
extract(fit_E0, "covs")
# Simulate LCA data: 30 observations, 5 categorical items, 3 latent classes
sim_data <- sim.LCA(N = 30, I = 5, L = 3)
# Extract the TRUE conditional probability array
extract(sim_data, "par")
Calculate Average Posterior Probability (AvePP)
Description
Computes the average posterior probability for the most likely class assignment
in latent class/profile analysis. This metric quantifies classification precision.
The total average posterior probability \geq 0.70 (Nylund-Gibson & Choi, 2018) indicate adequate classification quality.
Usage
get.AvePP(object)
Arguments
object |
An object of class
|
Value
A (L+1) \times (L+1) matrix with the following structure:
Rows: Represent each latent class (1 to L) and a final "Total" row.
Columns: Represent each latent class (1 to L) and a final "Total" column.
Diagonal elements
\text{ave}[l,l]: Average posterior probability for observations assigned to classl. That is,\overline{P}_{ll} = \frac{1}{N_l} \sum_{n: \hat{z}_n = l} p_{nl},where
N_lis the number of observations assigned to classl, and\hat{z}_n = \arg\max_{l'} p_{nl'}.Off-diagonal elements
\text{ave}[l,k](l \ne k): Average posterior probability of classkamong observations assigned to classl. Useful for assessing classification confusion.\overline{P}_{lk} = \frac{1}{N_l} \sum_{n: \hat{z}_n = l} p_{nk}.Bottom-right corner
\text{ave}[L+1,L+1]: Overall average posterior probability across all observations,\overline{P}_{\text{total}} = \frac{1}{N} \sum_{n=1}^N \max_{l} p_{nl}.
Note
Classification quality is considered acceptable if \overline{P}_{\text{total}} \geq 0.70 (Nylund-Gibson & Choi, 2018).
References
Nylund-Gibson, K., & Choi, A. Y. (2018). Ten frequently asked questions about latent class analysis. Translational Issues in Psychological Science, 4(4), 440-461. https://doi.org/10.1037/tps0000176
Examples
# Example with simulated data
set.seed(123)
data.obj <- sim.LCA(N = 500, I = 4, L = 2, IQ=0.9)
response <- data.obj$response
# Fit 2-class model with EM algorithm
fit.em <- LCA(response, L = 2, method = "EM", nrep = 10)
AvePP_value <- get.AvePP(fit.em)
print(AvePP_value)
Compute Classification Error Probability (CEP) Matrices
Description
Computes the Classification Error Probability (CEP) matrices (Liang et al., 2023) used in the bias-corrected three-step estimation of Latent Class/Profile Analysis with Covariates.
Usage
get.CEP(P.Z.Xns, time.cross = TRUE)
Arguments
P.Z.Xns |
A list of length
The list must be ordered chronologically (e.g., time 1 to |
time.cross |
Logical. If |
Details
The CEP matrix at time t gives the probability that an individual truly belongs
to latent class l' given that they were assigned (via modal assignment)
to class l at time t.
Formally, for time point t:
\mathrm{CEP}_t(l, l') =
P(Z_t = l \mid \hat{Z}_t = l')
=
\frac{
\sum_{i:\,\hat{z}_{it} = l'}
P(Z_{it} = l \mid X_i)
}{
N \, \hat{\pi}_{tl}
}
where:
-
Z_{it}is the true latent class of individualiat timet; -
P(Z_{it} = l \mid X_i)is the posterior probability from the first-step model; -
\hat{z}_{it} = \arg\max_l P(Z_{it} = l' \mid X_i)is the modal (most likely) assigned class; -
\hat{\pi}_{tl} = \frac{1}{N} \sum_{i=1}^N I(\hat{z}_{it} = l)is the observed proportion assigned to classlat timet; -
Nis the total sample size.
If time.cross = TRUE (default), a single pooled CEP matrix is computed by
aggregating counts across all time points. This assumes the classification error
structure is invariant over time (i.e., measurement invariance), as in
Liang et al. (2023). The same pooled matrix is then returned for every time point.
Value
A named list of length T. Each element is an L \times L matrix:
Row
l: true latent class;Column
l': individuals assigned to classl';Entry
(l, l'): estimatedP(\text{assigned class} = l' \mid \text{true class} = l).
When time.cross = TRUE, all matrices in the list are identical.
Names are "t1", "t2", ..., "tT".
Note
Assumes complete data (no missing values in posterior matrices).
All matrices in
P.Z.Xnsmust have identical dimensions (sameNandL).Assignment is based on modal class (
which.max).If no individual is assigned to a class at a time point, division by zero may occur.
References
Liang, Q., la Torre, J. d., & Law, N. (2023). Latent Transition Cognitive Diagnosis Model With Covariates: A Three-Step Approach. Journal of Educational and Behavioral Statistics, 48(6), 690-718. https://doi.org/10.3102/10769986231163320
Examples
# Simulate posterior probabilities for 2 time points, 3 classes, 100 individuals
set.seed(123)
N <- 100; L <- 3; times <- 2
P.Z.Xns <- replicate(times,
t(apply(matrix(runif(N * L), N, L), 1, function(x) x / sum(x))),
simplify = FALSE)
# Compute time-specific CEP matrices
cep_time_specific <- get.CEP(P.Z.Xns, time.cross = FALSE)
# Compute time-invariant (pooled) CEP matrix
cep_pooled <- get.CEP(P.Z.Xns, time.cross = TRUE)
Calculate Log-Likelihood for Latent Class Analysis
Description
Computes the log-likelihood of observed categorical data under a Latent Class Analysis (LCA) model given class probabilities and conditional response probabilities. The calculation assumes local independence of responses conditional on latent class membership.
Usage
get.Log.Lik.LCA(response, P.Z, par)
Arguments
response |
A numeric matrix of dimension
|
P.Z |
A numeric vector of length
|
par |
A 3-dimensional array of dimension
|
Details
The log-likelihood calculation follows these steps:
Response Standardization: Original responses are converted to 0-based integers using
adjust.response. For example, original values {1,2,5} become {0,1,2} (ordered and relabeled sequentially).Class-Specific Likelihood: For each observation
nand classl, compute:P(\mathbf{X}_n \mid Z_n=l) = \prod_{i=1}^I P(X_{ni} = x_{ni} \mid Z_n=l)where
x_{ni}is the standardized response value, and probabilities are taken frompar[l, i, x_{ni}+1].Marginal Likelihood: For each observation
n, combine class-specific likelihoods weighted by class probabilities:P(\mathbf{X}_n) = \sum_{l=1}^L \pi_l \cdot P(\mathbf{X}_n \mid Z_n=l)Log Transformation: Sum log-transformed marginal likelihoods across all observations:
\log \mathcal{L} = \sum_{n=1}^N \log P(\mathbf{X}_n)
Value
A single numeric value representing the total log-likelihood:
\log \mathcal{L} = \sum_{n=1}^N \log \left[ \sum_{l=1}^L \pi_l \prod_{i=1}^I P(X_{ni} = x_{ni} \mid Z=l) \right]
where x_{ni} is the standardized (0-based) response for person n on item i.
Calculate Log-Likelihood for Latent Profile Analysis
Description
Computes the log-likelihood of observed continuous data under a Latent Profile Analysis (LPA) model with multivariate normal distributions within each latent profile. Implements robust numerical techniques to handle near-singular covariance matrices.
Usage
get.Log.Lik.LPA(response, P.Z, means, covs, jitter = 1e-10)
Arguments
response |
A numeric matrix of dimension |
P.Z |
A numeric vector of length
|
means |
A matrix of dimension |
covs |
An array of dimension |
jitter |
A small positive constant (default: 1e-10) added to diagonal elements of covariance matrices to ensure numerical stability during Cholesky decomposition. |
Details
The log-likelihood calculation follows these steps:
Covariance Stabilization: Each covariance matrix
\boldsymbol{\Sigma}_lis symmetrized as(\boldsymbol{\Sigma}_l + \boldsymbol{\Sigma}_l^\top)/2. If Cholesky decomposition fails:Add
jitterto diagonal elements iteratively (up to 10 attempts, scaling jitter by 10x each attempt)Fall back to diagonal covariance matrix if decomposition still fails
Profile-Specific Density for observation
nin profilel:\log f(\mathbf{x}_n \mid Z_n=l) = -\frac{I}{2}\log(2\pi) - \frac{1}{2}\log|\boldsymbol{\Sigma}_l| - \frac{1}{2}(\mathbf{x}_n - \boldsymbol{\mu}_l)^\top \boldsymbol{\Sigma}_l^{-1} (\mathbf{x}_n - \boldsymbol{\mu}_l)Computed efficiently using Cholesky decomposition
\boldsymbol{\Sigma}_l = \mathbf{R}^\top\mathbf{R}where applicable.Joint Probability for observation
nand profilel:\log[\pi_l \cdot f(\mathbf{x}_n \mid Z_n=l)] = \log(\pi_l) + \log f(\mathbf{x}_n \mid Z_n=l)\log(\pi_l)uses\log(\pi_l + 10^{-12})to avoid undefined values.Marginal Likelihood per observation using log-sum-exp trick for numerical stability:
\log f(\mathbf{x}_n) = a_{\max} + \log\left( \sum_{l=1}^L \exp\left\{ \log[\pi_l \cdot f(\mathbf{x}_n \mid Z_n=l)] - a_{\max} \right\} \right)where
a_{\max} = \max_l \log[\pi_l \cdot f(\mathbf{x}_n \mid Z_n=l)].Total Log-Likelihood: Sum of
\log f(\mathbf{x}_n)across all observationsn=1,\dots,N.
Value
A single numeric value representing the total log-likelihood:
\log \mathcal{L} = \sum_{n=1}^N \log \left[ \sum_{l=1}^L \pi_l \cdot \mathcal{N}(\mathbf{x}_n \mid \boldsymbol{\mu}_l, \boldsymbol{\Sigma}_l) \right]
where \mathcal{N}(\cdot) denotes the multivariate normal density function.
Note
Critical implementation details:
Cholesky Decomposition: For non-degenerate cases (
I>1), used to compute:\log|\boldsymbol{\Sigma}_l| = 2\sum_{i=1}^I \log(r_{ii})and quadratic form\|\mathbf{R}^{-\top}(\mathbf{x}_n - \boldsymbol{\mu}_l)\|^2Univariate Handling: When
I=1, computes density directly without decompositionNumerical Safeguards:
Densities clamped to
-10^{10}when non-finiteMarginal likelihoods clamped to
-10^{10}when non-finiteExplicit dimension checks for
meansandcovs
Assumptions:
Multivariate normality within profiles
No missing data in
responsePositive-definite covariance matrices (after stabilization)
Calculate Log-Likelihood for Latent Transition Analysis
Description
Computes the observed-data log-likelihood for a Latent Transition Analysis (LTA) model
using the three-step approach with measurement error correction. The likelihood integrates over
all possible latent class paths while incorporating classification uncertainty via
Classification Error Probability (CEP) matrices. This function is designed to work with
parameters estimated from the LTA function.
Usage
get.Log.Lik.LTA(
params,
CEP,
P.Z.Xns,
Zs,
covariates,
covariates.timeCross = FALSE
)
Arguments
params |
A named
|
CEP |
A
where |
P.Z.Xns |
A
the posterior probability of individual |
Zs |
A |
covariates |
A |
covariates.timeCross |
Logical. If |
Details
The log-likelihood calculation follows these steps:
Latent Path Enumeration: All
L^Tpossible latent class trajectories\mathbf{z}_nare generated and cached.-
Initial Class Probabilities (time 1): For individual
n, compute using multinomial logit with covariates\mathbf{X}_{n1}:P(Z_{n1} = l \mid \mathbf{X}_{n1}) = \frac{\exp(\boldsymbol{\beta}_l^\top \mathbf{X}_{n1})} {\sum_{k=1}^L \exp(\boldsymbol{\beta}_k^\top \mathbf{X}_{n1})}where
\boldsymbol{\beta}_L = \mathbf{0}(reference class constraint). Numerical stabilization is applied via subtraction of the maximum linear predictor. Transition Probabilities (times
t \geq 2): For transition from classkat timet-1to classlat timet:P(Z_{nt} = l \mid Z_{n,t-1} = k, \mathbf{X}_{nt}) = \frac{\exp(\boldsymbol{\gamma}_{kl t}^\top \mathbf{X}_{nt})} {\sum_{j=1}^L \exp(\boldsymbol{\gamma}_{kj t}^\top \mathbf{X}_{nt})}where
\boldsymbol{\gamma}_{kL t} = \mathbf{0}for allk(reference class constraint).Path-Specific Likelihood: For each path
\mathbf{z}_nand individualn:Compute path probability:
P(Z_{n1}=z_{n1} \mid \mathbf{X}_{n1}) \times \prod_{t=2}^T P(Z_{nt}=z_{nt} \mid Z_{n,t-1}=z_{n,t-1}, \mathbf{X}_{nt})Apply CEP weights:
\prod_{t=1}^T P(\hat{Z}_{nt} = \hat{z}_{nt} \mid Z_{nt} = z_{nt}) = \prod_{t=1}^T \text{CEP}_t(z_{nt}, \hat{z}_{nt})Multiply path probability by CEP weights
-
Marginalization: Sum path-specific likelihoods over all
L^Tpaths for each individualn, then sum log-transformed marginal likelihoods across all individuals.
Value
A single numeric value representing the total observed-data log-likelihood:
\begin{aligned}
\log \mathcal{L}(\boldsymbol{\theta}) &=
\sum_{n=1}^N \log \Biggl[
\sum_{\mathbf{z}_n \in \{1,\dots,L\}^T}
\Bigl( \prod_{t=1}^T \text{CEP}_t(z_{nt}, \hat{z}_{nt}) \Bigr) \cdot \\
&\quad P(Z_{n1}=z_{n1} \mid \mathbf{X}_{n1}) \cdot
\prod_{t=2}^T P(Z_{nt}=z_{nt} \mid Z_{n,t-1}=z_{n,t-1}, \mathbf{X}_{nt})
\Biggr]
\end{aligned}
where \mathbf{z}_n = (z_{n1},\dots,z_{nT}) is a latent class path, \hat{z}_{nt} = \texttt{Zs[[t]][n]} is the modal assignment,
and \boldsymbol{\theta} denotes all model parameters (beta, gama).
Note
When no covariates are included:
Initial probabilities reduce to
P(Z_{n1} = l) = \pi_l(multinomial probabilities)Transition probabilities reduce to
P(Z_{nt} = l \mid Z_{n,t-1} = k) = \tau_{kl}^{(t)}(time-specific Markov transition probabilities)
See Also
LTA for three-step LTA estimation,
get.CEP for CEP matrix computation
Compute Posterior Latent Class Probabilities Based on Fixed Parameters
Description
Computes posterior probabilities of latent class membership while simultaneously
re-estimating class prevalences via an EM algorithm. Unlike standard posterior
computation, this function iteratively updates class prevalences (\pi_l)
using fixed conditional response probabilities (par).
Usage
get.P.Z.Xn.LCA(response, par, tol = 1e-10, maxiter = 2000, vis = TRUE)
Arguments
response |
Numeric matrix ( |
par |
3D array (
|
tol |
Convergence tolerance for absolute change in log-likelihood. Default: 1e-10. |
maxiter |
Maximum EM iterations. Default: 2000. |
vis |
Logical: show iteration progress? Default: TRUE. |
Details
Response categories are standardized to 0-based integers using
adjust.response.Class prevalences are initialized uniformly (
\pi_l^{(0)} = 1/L).Numerical stability: Small constants (1e-50) prevent division by zero.
Termination occurs when:
-
|\log L^{(t)} - \log L^{(t-1)}| < \code{tol}(log-likelihood change) Maximum iterations reached
-
Value
Numeric matrix (N \times L) of posterior probabilities.
Rows sum to 1. Columns named "Class.1", "Class.2", etc.
Algorithm
The function implements an EM algorithm with:
- E-step
Compute posterior probabilities for observation
nand classl:P(Z_n = l \mid \mathbf{X}_n) = \frac{\pi_l^{(t)} \prod_{i=1}^I P(X_{ni} = x_{ni} \mid Z_n = l)} {\sum_{k=1}^L \pi_k^{(t)} \prod_{i=1}^I P(X_{ni} = x_{ni} \mid Z_n = k)}- M-step
Update class prevalences:
\pi_l^{(t+1)} = \frac{1}{N} \sum_{n=1}^N P(Z_n = l \mid \mathbf{X}_n)
Convergence is determined by the absolute change in log-likelihood between iterations.
Examples
library(LCPA)
set.seed(123)
data.obj <- sim.LCA(N = 200, I = 3, L = 2, IQ = 0.85) # From LCPA
fit <- LCA(data.obj$response, L = 2, method = "EM", nrep = 5) # From LCPA
P.Z.Xn <- get.P.Z.Xn.LCA(
response = data.obj$response,
par = fit$params$par # Fixed conditional probabilities
)
head(P.Z.Xn)
Compute Posterior Latent Profile Probabilities Based on Fixed Parameters
Description
Computes posterior probabilities of latent profile membership while simultaneously
re-estimating profile prevalences via an EM algorithm. Unlike standard posterior
computation, this function iteratively updates profile prevalences (\pi_l)
using fixed profile characteristics (means and covariances).
Usage
get.P.Z.Xn.LPA(response, means, covs, tol = 1e-10, maxiter = 2000, vis = TRUE)
Arguments
response |
Numeric matrix ( |
means |
Numeric matrix (
Row |
covs |
3D array (
Each slice must be symmetric and positive definite (after jittering). |
tol |
Convergence tolerance for absolute change in log-likelihood. Default: 1e-10. |
maxiter |
Maximum EM iterations. Default: 2000. |
vis |
Logical: show iteration progress? Default: TRUE. |
Details
Numerical stability:
Covariance matrices are jittered with
tolfor positive definitenessLog-space computation with log-sum-exp trick
Uniform probabilities used as fallback for non-finite densities
Profile prevalences are initialized uniformly (
\pi_l^{(0)} = 1/L).Termination occurs when:
-
|\log L^{(t)} - \log L^{(t-1)}| < \code{tol}(log-likelihood change) Maximum iterations reached
-
Value
Numeric matrix (N \times L) of posterior probabilities.
Rows sum to 1. Columns named "Class.1", "Class.2", etc.
Algorithm
The function implements an EM algorithm with:
- E-step
Compute posterior probabilities for observation
nand profilel:P(Z_n = l \mid \mathbf{x}_n) = \frac{\pi_l^{(t)} \cdot \mathcal{N}(\mathbf{x}_n \mid \boldsymbol{\mu}_l, \boldsymbol{\Sigma}_l)} {\sum_{k=1}^L \pi_k^{(t)} \cdot \mathcal{N}(\mathbf{x}_n \mid \boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k)}- M-step
Update profile prevalences:
\pi_l^{(t+1)} = \frac{1}{N} \sum_{n=1}^N P(Z_n = l \mid \mathbf{x}_n)
Convergence is determined by the absolute change in log-likelihood between iterations.
Examples
library(LCPA)
set.seed(123)
data.obj <- sim.LPA(N = 300, I = 2, L = 2, constraint = "VV") # From LCPA
fit <- LPA(data.obj$response, L = 2, method = "EM", nrep = 5) # From LCPA
P.Z.Xn <- get.P.Z.Xn.LPA(
response = data.obj$response,
means = fit$params$means, # Fixed profile means
covs = fit$params$covs # Fixed profile covariances
)
head(P.Z.Xn)
Compute Standard Errors
Description
Computes approximate standard errors (SEs) for estimated parameters in Latent Class Analysis (LCA) or Latent Profile Analysis (LPA) models using two methods:
-
"Bootstrap": Non-parametric bootstrap with label-switching correction. McLachlan & Peel (2004) suggest that 50–100 replicates often provide adequate accuracy for practical purposes, though more (e.g., 500–1000) may be preferred for publication-quality inference. -
"Obs": Numerical evaluation of the observed information matrix (Hessian of negative log-likelihood)
Users should note that get.SE computes standard errors based on the observed
information matrix via numerical differentiation, which may lack precision and often yields
ill-conditioned matrices. Therefore, we recommend using method = "Bootstrap".
Usage
get.SE(object, method = "Bootstrap", n.Bootstrap = 100, vis = TRUE)
Arguments
object |
|
method |
Character specifying SE calculation method: |
n.Bootstrap |
Integer. Number of bootstrap replicates when |
vis |
Logical. If |
Value
A list of class "SE" containing:
seNamed list of SEs matching parameter structure of input model:
LPA:
means(matrix: classes x variables),covs(array: vars x vars x classes),P.Z(vector: class prob SEs)LCA:
par(array: classes x items x categories),P.Z(vector: class prob SEs)Critical Note for
"Obs"method: Only free parameters have non-zero SEs. Non-free parameters (e.g., last class probability inP.Zdue to sum-to-1 constraint; last category probability in LCA items) have SE=0. Bootstrap provides SEs for all parameters.
vcovNULLfor bootstrap. For"Obs": variance-covariance matrix (may be regularized). Diagonal contains squared SEs of free parameters.hessianNULLfor bootstrap. For"Obs": observed information matrix (pre-regularization). Dimension = number of free parameters.diagnosticsMethod-specific diagnostics:
Bootstrap:
n.Bootstrap.requested,n.Bootstrap.completedObs: Hessian computation details, condition number, regularization status, step sizes
callFunction call that generated the object
argumentsList of input arguments
References
McLachlan, G. J., & Peel, D. (2004). Finite Mixture Models. Wiley. https://books.google.com.sg/books?id=c2_fAox0DQoC
Examples
library(LCPA)
set.seed(123)
# LPA with Bootstrap (minimal replicates for example)
lpa_data <- sim.LPA(N = 500, I = 4, L = 3)
lpa_fit <- LPA(lpa_data$response, L = 3)
se_boot <- get.SE(lpa_fit, method = "Bootstrap", n.Bootstrap = 10)
print(se_boot)
extract(se_boot, "covs")
# LCA with Observed Information (note zeros for constrained parameters)
lca_data <- sim.LCA(N = 500, I = 4, L = 3, poly.value = 5)
lca_fit <- LCA(lca_data$response, L = 3)
se_obs <- get.SE(lca_fit, method = "Obs")
print(se_obs)
extract(se_obs, "par")
Calculate Classification Entropy
Description
Computes the relative entropy statistic to evaluate classification quality in Latent Class Analysis (LCA) or Latent Profile Analysis (LPA) models. Entropy measures how accurately cases are assigned to latent classes based on posterior probabilities, with values closer to 1 indicating better separation between classes.
Usage
get.entropy(object)
Arguments
object |
An object of class
|
Value
A numeric value between 0 and 1 representing the relative entropy (Nylund-Gibson et al., 2018; Clark et al., 2013):
1.0: Perfect classification (each case belongs exclusively to one class)
0.8-1.0: Good classification quality
0.6-0.8: Moderate classification quality
< 0.6: Poor classification quality (consider model simplification)
Calculated using the formula:
1 - \frac{\sum_{n=1}^N \sum_{l=1}^L -p_{nl} \ln(p_{nl})}{N \ln(L)}
where:
-
N= Sample size -
L= Number of latent classes -
p_{nl}= Posterior probability of observationnbelonging to classl
Note
The calculation includes a small constant (1e-10) to avoid log(0)
instability when posterior probabilities are exactly zero.
Values should be interpreted alongside other diagnostics (BIC, bootstrapped LRT) as high entropy alone doesn't guarantee model validity. Low entropy may indicate:
Overly complex model (too many classes)
Poorly measured latent constructs
Violation of local independence assumption
References
Nylund-Gibson, K., & Choi, A. Y. (2018). Ten frequently asked questions about latent class analysis. Translational Issues in Psychological Science, 4(4), 440-461. https://doi.org/10.1037/tps0000176
Clark, S. L., Muthén, B., Kaprio, J., D'Onofrio, B. M., Viken, R., & Rose, R. J. (2013). Models and Strategies for Factor Mixture Analysis: An Example Concerning the Structure Underlying Psychological Disorders. Structural Equation Modeling: A Multidisciplinary Journal, 20(4), 681-703. https://doi.org/10.1080/10705511.2013.824786
Examples
# Example with simulated data
set.seed(123)
data.obj <- sim.LCA(N = 500, I = 4, L = 2, IQ=0.9)
response <- data.obj$response
# Fit 2-class model with EM algorithm
fit.em <- LCA(response, L = 2, method = "EM", nrep = 10)
entropy_value <- get.entropy(fit.em)
cat("Classification entropy:", round(entropy_value, 3), "\n")
Calculate Fit Indices
Description
Computes a comprehensive set of model fit indices for objects returned by
LCA or LPA. These indices balance model
fit (log-likelihood) with model complexity (number of parameters) to facilitate
model selection. All indices are derived from the observed-data log-likelihood
and parameter count.
Usage
get.fit.index(object)
Arguments
object |
An object of class
|
Value
An object of class "fit.index" containing:
- npar
Number of free parameters in the model
- Log.Lik
Log-likelihood of the model:
\log \mathcal{L}- -2LL
Deviance statistic:
-2 \log \mathcal{L}-2 \sum_{n=1}^N \log \left[ \sum_{l=1}^L \pi_l \cdot f(\mathbf{x}_n \mid \boldsymbol{\theta}_l) \right], where
\pi_lis the prior probability of classl,f(\cdot)is the probability density/mass function (multivariate normal for LPA, multinomial for LCA), and\boldsymbol{\theta}_lare class-specific parameters.- AIC
Akaike Information Criterion:
\mathrm{AIC} = -2 \log \mathcal{L} + 2k, wherenpar= number of free parameters. Lower values indicate better fit.- BIC
Bayesian Information Criterion:
\mathrm{BIC} = -2 \log \mathcal{L} + npar \log(N), whereN= sample size. Incorporates stronger penalty for complexity than AIC.- SIC
Sample-Size Adjusted BIC:
\mathrm{SIC} = -\frac{1}{2} \mathrm{BIC}. Equivalent to\log \mathcal{L} - \frac{npar}{2} \log(N). Often used in latent class modeling.- CAIC
Consistent AIC:
\mathrm{CAIC} = -2 \log \mathcal{L} + npar \left[ \log(N) + 1 \right]. Consistent version of AIC that converges to true model asN \to \infty.- AWE
Approximate Weight of Evidence:
\mathrm{AWE} = -2 \log \mathcal{L} + 1.5k \left[ \log(N) + 1 \right]. Penalizes complexity more heavily than CAIC.- SABIC
Sample-Size Adjusted BIC:
\mathrm{SABIC} = -2 \log \mathcal{L} + npar \log \left( \frac{N + 2}{24} \right). Recommended for latent class/profile analysis with moderate sample sizes.
Examples
# Fit LPA model
set.seed(123)
data.obj <- sim.LPA(N = 100, I = 3, L = 2, constraint = "E0")
fit <- LPA(data.obj$response, L = 2, constraint = "VV", method = "EM")
# Compute fit indices
fit_indices <- get.fit.index(fit)
fit_indices
extract(fit_indices, "SABIC")
Calculate Number of Free Parameters in Latent Class Analysis
Description
Computes the total number of free parameters in an LCA model based on the number of categories per observed variable and the number of latent classes. This follows standard LCA parameterization with local independence assumption.
Usage
get.npar.LCA(poly.value, L)
Arguments
poly.value |
A numeric vector of length |
L |
Integer specifying the number of latent classes. |
Details
Parameter count derivation:
- Fixed components (always present):
-
Conditional response probabilities:
\sum_{i=1}^I (L \times K_i - 1)parametersIndependent class proportions:
L-1parameters (since\sum_{l=1}^L \pi_l = 1)
- Per-variable parameterization:
-
For each observed variable
iwithK_icategories:Each latent class requires
K_iconditional probabilitiesP(X_i=k|Z=l)With constraints
\sum_{k=1}^{K_i} P(X_i=k|Z=l) = 1for each classlGlobal constraints reduce total parameters to
L \times K_i - 1per variable
Value
Integer representing the total number of free parameters in the model:
\text{npar} = \sum_{i=1}^I \underbrace{(L \times K_i - 1)}_{\text{free parameters}} + \underbrace{(L-1)}_{\text{class proportions}}
Examples
# Example 1: 3 binary variables (K_i=2), 2 latent classes
poly.value <- c(2, 2, 2) # Three binary variables
L <- 2
npar <- sum(poly.value * L - 1) + (L - 1) # = (4-1)+(4-1)+(4-1) + 1 = 3+3+3+1 = 10
get.npar.LCA(poly.value, L) # Returns 10
# Example 2: Mixed variable types (binary, ternary, quaternary)
poly.value <- c(2, 3, 4) # Variables with 2, 3, and 4 categories
L <- 3
npar <- sum(poly.value * L - 1) + (L - 1) # = (6-1)+(9-1)+(12-1) + 2 = 5+8+11+2 = 26
get.npar.LCA(poly.value, L) # Returns 26
# Example 3: Single polytomous variable with 5 categories, 4 latent classes
poly.value <- 5
L <- 4
npar <- sum(poly.value * L - 1) + (L - 1) # = (20-1) + 3 = 19+3 = 22
get.npar.LCA(poly.value, L) # Returns 22
Calculate Number of Free Parameters in Latent Profile Analysis
Description
Computes the total number of free parameters in an LPA model based on the number of observed
variables (I), number of latent profiles (L), and covariance structure constraints.
Usage
get.npar.LPA(I, L, constraint = "VV")
Arguments
I |
Integer specifying the number of continuous observed variables. |
L |
Integer specifying the number of latent profiles. |
constraint |
Character string specifying covariance structure constraints. Supported options:
Default: |
Details
Parameter count breakdown:
Fixed components (always present):
Profile-specific means:
L \times IparametersIndependent class proportions:
L-1parameters (since\sum_{l=1}^L \pi_l = 1)
Covariance parameters (varies by constraint):
I = 1:-
-
"UE": 1 shared variance parameter -
"UV":Lprofile-specific variance parameters
-
I > 1:-
-
"E0":Ishared variance parameters (no covariances) -
"V0":L \times Iprofile-specific variance parameters (no covariances) -
"EE":\frac{I(I+1)}{2}parameters for one shared full covariance matrix -
"VE":L \times Idiagonal variances (free per profile) +\frac{I(I-1)}{2}off-diagonal covariances (shared across profiles) -
"EV":Idiagonal variances (shared across profiles) +L \times \frac{I(I-1)}{2}off-diagonal covariances (free per profile) -
"VV":L \times \frac{I(I+1)}{2}parameters forLdistinct full covariance matrices
-
Value
Integer representing the total number of free parameters in the model:
\text{npar} = \underbrace{L \times I}_{\text{means}} + \underbrace{(L-1)}_{\text{class proportions}} + \underbrace{\text{covariance parameters}}_{\text{depends on constraint}}
Note
Important considerations:
For
I = 1, only"UE"and"UV"are meaningful;"EE","E0","VV","V0", etc., are treated as"UE"or"UV"respectively.Covariance parameters count only free elements in symmetric matrices (diagonal + upper triangle).
If an user-defined
constraintis provided, the function defaults to"VV"behavior but subtracts(L-1) \times \text{length(constraint)}.-
"VE"and"EV"constraints requireI > 1to be meaningful (otherwise no covariances exist).
Examples
# Univariate examples (I=1)
get.npar.LPA(I = 1, L = 2, constraint = "UE")
get.npar.LPA(I = 1, L = 3, constraint = "UV")
# Multivariate examples (I=3)
get.npar.LPA(I = 3, L = 2, constraint = "E0")
get.npar.LPA(I = 3, L = 2, constraint = "V0")
get.npar.LPA(I = 3, L = 2, constraint = "EE")
get.npar.LPA(I = 3, L = 2, constraint = "VV")
get.npar.LPA(I = 3, L = 2, constraint = "VE")
get.npar.LPA(I = 3, L = 2, constraint = "EV")
# User defined example
get.npar.LPA(I = 3, L = 2, constraint = list(c(1, 2), c(3, 3)))
Calculate Number of Free Parameters in Latent Transition Analysis
Description
Computes the total number of free parameters in a Latent Transition Analysis (LTA) model estimated via the three-step approach. The count depends on the number of latent classes, the number of time points, the number of covariates at each time point, and whether transition coefficients are constrained to be equal across time.
Usage
get.npar.LTA(covariates.ncol, L, covariates.timeCross = FALSE)
Arguments
covariates.ncol |
An integer vector of length |
L |
Integer scalar. Number of latent classes ( |
covariates.timeCross |
Logical. If |
Details
Parameterization:
- Initial status model (time 1):
-
Multinomial logit model with
Lclasses (last class is reference). Number of free parameters:M_1 \times (L-1). - Transition models (time
t \to t+1): -
For each transition, a multinomial logit model conditioned on previous class. For each origin class
kand destination classl(l \neq L), there is a coefficient vector of lengthM_{t+1}. Total per transition:L \times (L-1) \times M_{t+1}parameters. The constraintcovariates.timeCrossdetermines whether these parameters are shared across transitions.
Value
Integer representing the total number of free parameters:
npar = M_1 \times (L-1) + \begin{cases}
L \times (L-1) \times M_2 & \text{if } T>1 \text{ and time-invariant effects} \\
\sum_{t=2}^T L \times (L-1) \times M_t & \text{if } T>1 \text{ and time-varying effects} \\
0 & \text{if } T=1
\end{cases}
where:
-
time-invariant effects corresponds to
covariates.timeCross = TRUE -
time-varying effects corresponds to
covariates.timeCross = FALSE
Note
Critical assumptions:
The last latent class (
L) is always the reference category for all logits.When
covariates.timeCross = TRUE, it is assumed that all time points after the first have identical covariate structures (M_2 = M_3 = \dots = M_T). If violated, the function usesM_1for all transitions to matchLTA's internal behavior.For
T=1, no transition parameters are estimated (pure latent class/profile analysis).
Examples
# Example 1: 2 time points, 2 classes, time-invariant transition coefficients
# Time1: 2 covariates (intercept + 1 predictor)
# Time2: 3 covariates (but constrained to match Time1 due to timeCross=TRUE)
covariates.ncol <- c(2, 3)
L <- 2
get.npar.LTA(covariates.ncol, L, covariates.timeCross = TRUE)
# Example 2: Same as above but time-varying coefficients
get.npar.LTA(covariates.ncol, L, covariates.timeCross = FALSE)
# Example 3: 3 time points, 3 classes, time-invariant coefficients
covariates.ncol <- c(2, 2, 2) # All time points have identical covariates
L <- 3
get.npar.LTA(covariates.ncol, L, covariates.timeCross = TRUE)
# Example 4: 3 time points, 3 classes, time-varying coefficients
covariates.ncol <- c(2, 3, 4)
L <- 3
get.npar.LTA(covariates.ncol, L, covariates.timeCross = FALSE)
# Example 5: Single time point (equivalent to LCA)
covariates.ncol <- c(3)
L <- 4
get.npar.LTA(covariates.ncol, L)
Install Required Python Dependencies for Neural Latent Variable Models
Description
Checks whether five essential Python packages required to run neural latent variable models
(e.g., LCAnet, LPAnet) are installed in the current Python environment. If any are missing,
the user is interactively prompted to install them via reticulate::py_install().
The targeted packages are:
-
numpy— Fundamental package for numerical computing in Python. -
torch— PyTorch deep learning framework (supports CPU/GPU computation). -
matplotlib— 2D plotting and visualization library. -
scikit-learn— Machine learning utilities (used here primarily for KMeans initialization). -
scipy— Scientific computing and advanced linear algebra routines.
For torch, users can choose between CPU-only or GPU-enabled versions (with CUDA support).
Available CUDA versions are filtered by OS compatibility.
This function is especially useful when deploying models that bridge R and Python via reticulate, ensuring all backend dependencies are met before model execution.
Usage
install_python_dependencies()
Details
The function performs the following steps for each dependency:
Uses
reticulate::py_module_available()to test if the module is importable.If not available, prints a message describing the package’s purpose.
Prompts the user interactively (via
readline) whether to proceed with installation.For
torch, offers CPU/GPU choice and CUDA version selection if GPU is chosen.Installs the package using
reticulate::py_install()with appropriate index URL if needed.Returns a logical list indicating initial installation status of each package.
Note: This function requires reticulate to be loaded and a valid Python environment configured. It does NOT automatically install reticulate or configure Python — that must be done separately.
Value
A named list of logical values indicating whether each package was already installed before running this function:
numpy_installed |
Logical. Was |
torch_installed |
Logical. Was |
matplotlib_installed |
Logical. Was |
sklearn_installed |
Logical. Was |
scipy_installed |
Logical. Was |
Examples
library(reticulate)
# Ensure reticulate is loaded and Python is configured
# need python
## Not run:
# Run dependency installer
deps <- install_python_dependencies()
# Check which were missing
print(deps)
## End(Not run)
Compute the Logistic (Sigmoid) Function
Description
This function computes the logistic (also known as sigmoid) transformation of the input. The logistic function maps real-valued numbers to the open interval (0, 1), and is widely used in machine learning, statistical modeling (e.g., logistic regression), and neural networks as an activation function or link function.
Usage
logit(x)
Arguments
x |
A numeric vector, matrix, or array. Accepts any real number, including |
Details
The logistic function is defined as:
\mathrm{logit}^{-1}(x) = \frac{1}{1 + e^{-x}}
Note: Despite the name "logit", this function actually computes the inverse logit (i.e., the
logistic function). The true logit function is the inverse: \log(p / (1 - p)).
However, in many applied contexts—especially in software—the term "logit" is sometimes
informally used to refer to the sigmoid. For clarity, this implementation follows the
conventional definition of the logistic/sigmoid function.
Value
A numeric object of the same dimension as x, where each element is the
logistic transformation of the corresponding input:
If
x = 0, returns0.5As
x -> Inf, output approaches1As
x -> -Inf, output approaches0-
NAvalues remainNA
Examples
logit(0) # 0.5
logit(c(-Inf, 0, Inf)) # c(0, 0.5, 1)
logit(c(-2, -1, 0, 1, 2))
Column-wise Z-Score Standardization
Description
Standardizes each column of a numeric matrix or data frame to have mean zero and standard deviation one. This transformation is essential for many multivariate techniques that assume standardized inputs. The function preserves all dimension names and returns a pure numeric matrix with attributes storing original column means and standard deviations.
Usage
normalize(response)
Arguments
response |
A numeric matrix or data frame of dimension
Non-numeric columns will be coerced to numeric with a warning. Missing values are not allowed
and will cause the function to fail. Constant columns (zero variance) will produce |
Value
A standardized numeric matrix of dimension N \times I with attributes:
-
scaled:center: Vector of original column means (\mu_i) -
scaled:scale: Vector of original column standard deviations (\sigma_i) Row names: Preserved from original input's row names or row indices
Column names: Preserved from original input's column names
Values: Z-scores calculated as
z_{ni} = \frac{x_{ni} - \mu_i}{\sigma_i}
where:
-
x_{ni}= original value for observationnand variablei -
\mu_i= sample mean of variablei:\mu_i = \frac{1}{N}\sum_{n=1}^{N}x_{ni} -
\sigma_i= sample standard deviation of variablei:\sigma_i = \sqrt{\frac{1}{N-1}\sum_{n=1}^{N}(x_{ni} - \mu_i)^2}
The denominator N-1 provides an unbiased estimator of population variance.
Mathematical Details
For each column i in the input matrix X, the standardization is performed as:
Z_{\cdot i} = \frac{X_{\cdot i} - \bar{X}_{\cdot i}}{S_{X_{\cdot i}}}
where:
-
X_{\cdot i}is thei-th column vector ofX -
\bar{X}_{\cdot i}is the sample mean of columni -
S_{X_{\cdot i}}is the sample standard deviation of columni
The resulting matrix Z has the properties:
\frac{1}{N}\sum_{n=1}^{N}z_{ni} = 0 \quad \text{and} \quad \sqrt{\frac{1}{N-1}\sum_{n=1}^{N}z_{ni}^2} = 1
for all i = 1, \ldots, I.
Examples
# Basic usage with matrix
set.seed(123)
mat <- matrix(rnorm(30, mean = 5:7, sd = 1:3), ncol = 3,
dimnames = list(paste0("Obs", 1:10), paste0("Var", 1:3)))
norm_mat <- normalize(mat)
# Verify attributes
attr(norm_mat, "scaled:center") # Original column means
attr(norm_mat, "scaled:scale") # Original column standard deviations
# Verify properties
apply(norm_mat, 2, mean) # Should be near zero
apply(norm_mat, 2, sd) # Should be exactly 1
# With data frame input
df <- as.data.frame(mat)
norm_df <- normalize(df)
all.equal(norm_mat, norm_df, check.attributes = FALSE) # Should be identical
# Handling constant columns (produces NaN)
const_mat <- cbind(mat, Constant = rep(4.2, 10))
normalize(const_mat)
Visualize Response Distributions with Density Plots
Description
Creates a publication-quality density plot showing the distribution of responses across multiple items/variables. Automatically handles variable ordering, color scaling, and legend layout based on the number of variables.
Usage
plotResponse(response)
Arguments
response |
A matrix or data frame containing response data where:
Non-numeric columns (except row identifiers) will cause errors. |
Value
A ggplot object containing:
Density curves for each variable colored by item
Adaptive color schemes based on number of variables
Optimized legend layout for large numbers of items
Publication-ready theme with grid lines and clean styling
The plot can be further customized using standard ggplot2 syntax.
Theming Details
The plot uses a minimal theme with:
Light grey grid lines for readability
Black axis lines and ticks (0.8pt thickness)
White background with no panel border
Optimized font sizes (13pt axis titles, 11pt tick labels)
Legend positioned on right with adaptive sizing
Examples
# Simulate response data for 5 items
set.seed(42)
resp_data <- data.frame(
item1 = rnorm(100, mean = 3, sd = 1),
item2 = rnorm(100, mean = 2, sd = 0.8),
item3 = rnorm(100, mean = 4, sd = 1.2),
item4 = rnorm(100, mean = 3.5, sd = 0.9),
item5 = rnorm(100, mean = 2.5, sd = 1.1)
)
library(LCPA)
# Generate and display plot
p <- plotResponse(resp_data)
print(p)
# For data with many items (18 items example)
many_items <- as.data.frame(replicate(18, rnorm(50, mean = runif(1, 1, 5), sd = 1)))
names(many_items) <- paste0("Q", 1:18)
p_large <- plotResponse(many_items)
print(p_large)
S3 Methods: print
Description
Provides user-friendly, formatted console output for objects generated by the LCPA package.
This generic function dispatches to class-specific methods that display concise summaries of model results,
simulated datasets, fit indices, model comparisons, and standard errors. Designed for interactive use
and quick diagnostic inspection.
Usage
## S3 method for class 'LCA'
print(x, ...)
## S3 method for class 'summary.LCA'
print(x, ...)
## S3 method for class 'LPA'
print(x, ...)
## S3 method for class 'summary.LPA'
print(x, ...)
## S3 method for class 'LTA'
print(x, ...)
## S3 method for class 'summary.LTA'
print(x, ...)
## S3 method for class 'LCPA'
print(x, ...)
## S3 method for class 'summary.LCPA'
print(x, ...)
## S3 method for class 'sim.LCA'
print(x, ...)
## S3 method for class 'summary.sim.LCA'
print(x, ...)
## S3 method for class 'sim.LPA'
print(x, ...)
## S3 method for class 'summary.sim.LPA'
print(x, ...)
## S3 method for class 'sim.LTA'
print(x, ...)
## S3 method for class 'summary.sim.LTA'
print(x, ...)
## S3 method for class 'fit.index'
print(x, ...)
## S3 method for class 'summary.fit.index'
print(x, ...)
## S3 method for class 'compare.model'
print(x, ...)
## S3 method for class 'summary.compare.model'
print(x, ...)
## S3 method for class 'SE'
print(x, digits = 4, I.max = 5, L.max = 3, ...)
## S3 method for class 'summary.SE'
print(x, ...)
Arguments
x |
An object of one of the following classes:
|
... |
Additional arguments passed to methods (currently ignored in most cases). |
digits |
Number of decimal places for numeric output (default: varies by method, often 4).
Used by: |
I.max |
Maximum number of variables/items to display before truncation (default: varies, e.g., 5).
Used by: |
L.max |
Maximum number of latent classes/profiles to display before truncation (default: varies, e.g., 3).
Used by: |
Details
Each method produces a structured, human-readable summary optimized for its object type:
- Model Objects (
LCA/LPA/LCPA/LTA) -
Invokes
summary()internally and prints comprehensive output including:Model call and configuration (method, constraints, reference class)
Data characteristics (N, I, time points, distribution)
Fit statistics (LogLik, AIC, BIC, entropy, npar)
Class/profile prior probabilities and frequencies
Item-response probabilities (
LCA) or profile means (LPA)For
LCPA/LTA: regression coefficients with significance markers and 95% CIsConvergence diagnostics (iterations, tolerance, hardware)
Replication details (if
nrep > 1)
- Simulation Objects (
sim.LCA/sim.LPA/sim.LTA) -
Displays simulation design and true parameter structure:
Configuration (N, I, L, times, constraint, distribution)
True class/profile proportions and observed frequencies
For
sim.LCA: item category structure and conditional probabilitiesFor
sim.LPA: covariance constraint description and mean rangesFor
sim.LTA: transition mode (fixed/covariate), initial/transition coefficients
Output is truncated for high-dimensional structures using
I.maxandL.max. - Fit Index Objects (
fit.index) -
Presents a clean table of model fit criteria:
Header with dimensions (N, I, L, npar)
Formatted table: AIC, BIC, SABIC, CAIC, AWE, -2LL, SIC
Interpretation note: “Lower values preferred for ICs”
Values rounded to
digitsdecimal places
- Model Comparison Objects (
compare.model) -
Compares two nested models with statistical tests:
Comparative fit table (npar, LogLik, AIC, BIC, entropy)
Classification quality (AvePP per class, overall entropy)
Bayes Factor with interpretive guidance
Likelihood ratio tests (standard, VLMR, Bootstrap) with p-values and significance codes
Clear section headers and visual separators
- Standard Error Objects (
SE) -
Displays uncertainty estimates for model parameters:
Class probability SEs (always fully shown)
Profile means SEs (
LPA) or item-response SEs (LCA), truncated byL.max/I.maxCovariance SE summary (non-zero count only; full access via
extract())Diagnostics: Bootstrap completion % or Hessian condition number with stability warnings
- Summary Objects
-
All
summary.*methods are called internally by their correspondingprint.*methods. They pre-compute and structure output for consistent formatting. Direct calls are also supported.
Value
Invisibly returns the input object x. No data is modified.
Methods (by class)
-
print(LCA): Print method forLCAobjects -
print(summary.LCA): Print method forsummary.LCAobjects -
print(LPA): Print method forLPAobjects -
print(summary.LPA): Print method forsummary.LPAobjects -
print(LTA): Print method forLTAobjects -
print(summary.LTA): Print method forsummary.LTAobjects -
print(LCPA): Print method forLCPAobjects -
print(summary.LCPA): Print method forsummary.LCPAobjects -
print(sim.LCA): Print method forsim.LCAobjects -
print(summary.sim.LCA): Print method forsummary.sim.LCAobjects -
print(sim.LPA): Print method forsim.LPAobjects -
print(summary.sim.LPA): Print method forsummary.sim.LPAobjects -
print(sim.LTA): Print method forsim.LTAobjects -
print(summary.sim.LTA): Print method forsummary.sim.LTAobjects -
print(fit.index): Print method forfit.indexobjects -
print(summary.fit.index): Print method forsummary.fit.indexobjects -
print(compare.model): Print method forcompare.modelobjects -
print(summary.compare.model): Print method forsummary.compare.modelobjects -
print(SE): Print method forSEobjects -
print(summary.SE): Print method for summary.SE objects
Output Conventions
Numeric values are typically rounded to 4 decimal places unless overridden by
digits.Large matrices (e.g., item parameters, transition coefficients) are truncated with clear messages.
Significance markers:
***(<0.001),**(<0.01),*(<0.05),.(<0.1).95% confidence intervals computed as: Estimate ± 1.96 × Std_Error.
Reference classes (for multinomial models) are explicitly stated.
Warnings appear for unstable SEs (high condition number) or incomplete Bootstrap runs.
Generate Random Samples from the Dirichlet Distribution
Description
rdirichlet generates n random observations from a Dirichlet distribution
with a specified concentration parameter vector alpha.
Usage
rdirichlet(n, alpha)
Arguments
n |
Integer. The number of random vectors to generate. |
alpha |
Numeric vector. The concentration parameters (must be positive).
The length of this vector determines the number of dimensions |
Details
The Dirichlet distribution is a family of continuous multivariate probability distributions
parameterized by a vector \alpha of positive reals. It is the multivariate
generalization of the beta distribution and is commonly used as a conjugate prior
to the multinomial distribution in Bayesian statistics.
Probability Density Function:
For a vector x = (x_1, \dots, x_K) on the unit simplex (where \sum x_i = 1
and x_i \ge 0), the density is given by:
f(x_1, \dots, x_K; \alpha_1, \dots, \alpha_K) = \frac{1}{B(\alpha)} \prod_{i=1}^{K} x_i^{\alpha_i - 1}
where the normalizing constant B(\alpha) is the multivariate beta function:
B(\alpha) = \frac{\prod_{i=1}^{K} \Gamma(\alpha_i)}{\Gamma(\sum_{i=1}^{K} \alpha_i)}
Simulation Method:
The function utilizes the property that if Y_1, \dots, Y_K are independent
Gamma random variables such that Y_i \sim Gamma(shape = \alpha_i, rate = 1), then:
X_i = \frac{Y_i}{\sum_{j=1}^{K} Y_j}
The resulting vector (X_1, \dots, X_K) follows a Dirichlet distribution with parameters \alpha.
Value
A matrix with n rows and length(alpha) columns.
Each row sums to 1, representing a single sample from the Dirichlet distribution.
Examples
# Generate 5 samples from a 3-dimensional Dirichlet distribution
set.seed(123)
alpha_params <- c(1, 2, 5)
result <- rdirichlet(n = 5, alpha = alpha_params)
print(result)
# Check that rows sum to 1
rowSums(result)
Simulate Data for Latent Class Analysis
Description
Generates synthetic multivariate categorical data from a latent class model with L latent classes.
Each observed variable follows a multinomial distribution within classes, with flexible control over
class separation via the IQ parameter and class size distributions.
Usage
sim.LCA(
N = 1000,
I = 10,
L = 3,
poly.value = 5,
IQ = "random",
distribution = "random",
params = NULL,
is.sort = TRUE
)
Arguments
N |
Integer; total number of observations to simulate. Must be |
I |
Integer; number of categorical observed variables. Must be |
L |
Integer; number of latent classes. Must be |
poly.value |
Integer or integer vector; number of categories (levels) for each observed variable.
If scalar, all variables share the same number of categories. If vector, must have length |
IQ |
Character or numeric; controls category probability distributions:
|
distribution |
Character; distribution of class sizes. Options: |
params |
List with fixed parameters for simulation:
|
is.sort |
A logical value. If |
Details
Probability Generation:
Dirichlet Sampling (
IQ="random"): For each variable-class combination, probabilities are drawn from\text{Dirichlet}(\alpha_1=3, \dots, \alpha_k=3)wherek = \text{poly.value}[i].High-Discrimination Mode (
IQ=numeric): For each variablei:Generate special probabilities
par.specialof lengthLcontaining:IQ,1-IQ, andL-2values uniformly sampled from[1-IQ, IQ].For each class
l, assignpar.special[l]to one category, distribute remaining probability1 - \text{par.special}[l]uniformly (via Dirichlet) across other categories.Shuffle category assignments to avoid position bias.
Data Generation:
Class assignments
Zare generated first according todistribution.For each observation
pand variablei:Retrieve cumulative probabilities for class
Z[p]Draw uniform random number
u \sim \text{Uniform}(0,1)Assign category
kwhereP(\text{category} \leq k-1) < u \leq P(\text{category} \leq k)
Entire dataset is regenerated if any category of any variable has zero observations.
Critical Constraints:
When
IQis numeric:0.5 < IQ < 1andmin(poly.value) >= 2-
Nmust be sufficiently large to observe all categories, especially whenIQis high orpoly.valueis large. Simulation may fail for smallN. For
distribution="uniform", empty classes possible whenN < L.
Value
A list containing:
- response
Integer matrix (
N \times I) of simulated observations. Rows are observations (named"O1","O2", ...), columns are variables named"I1","I2", ... Values range from0topoly.value[i]-1.- par
Array (
L \times I \times K) of true class-specific category probabilities, whereK = \text{max}(poly.value)(i.e., the maximum number of categories across variables). Dimensions: classes x variables x categories. Note: For variables withpoly.value[i] < K, unused category dimensions containNA. Dimension names:"L1","L2", ... (classes);"I1","I2", ... (variables);"poly0","poly1", ... (categories).- Z
Integer vector (length
N) of true class assignments (1 to L). Named with observation IDs (e.g.,"O1").- P.Z
Numeric vector (length
L) of true class proportions. Named with class labels (e.g.,"L1").- poly.value
Integer vector (length
I) specifying number of categories per variable.- P.Z.Xn
Binary matrix (
N \times L) of true class membership indicators (one-hot encoded). Rowi, columnl= 1 if observationibelongs to classl, else 0. Row/column names matchZand class labels.- arguments
A list containing all input arguments.
Item Quality (IQ) Parameter
Controls the discriminative power of observed variables:
IQ = "random"(Default) Category probabilities for each variable-class combination are drawn from a symmetric Dirichlet distribution (
\alpha = 3), resulting in moderate class separation.IQ = numeric(0.5 < IQ < 1) Forces high discriminative power for each variable:
For each variable, two categories per class are assigned extreme probabilities: one category gets probability
IQ, another gets1-IQ.Remaining categories share the residual probability
1 - IQ - (1-IQ) = 0. Note: This requirespoly.value>= 2 for all variables.Category assignments are randomized within classes to avoid structural patterns.
Higher
IQvalues (closer to 1) yield stronger class separation but increase simulation failure risk.
Class Size Distribution
"random"(Default) Class proportions drawn from Dirichlet distribution (
\alpha = 3for all classes), ensuring no empty classes. Sizes are rounded to integers with adjustment for exactN."uniform"Equal probability of class membership (
1/Lper class), sampled with replacement. May produce empty classes ifNis small relative toL.
Response Validation
The simulation enforces a critical constraint: every category of every observed variable must appear at least once in the dataset. If initial generation violates this (e.g., a rare category is missing), parameters and responses are regenerated until satisfied. This ensures compatibility with standard LCA estimation.
Examples
# Example 1: Default settings (moderate separation, random class sizes)
sim_data <- sim.LCA(N = 30, I = 5, L = 3)
# Example 2: High-discrimination items (IQ=0.85), uniform class sizes
sim_high_disc <- sim.LCA(
N = 30,
I = 4,
L = 2,
poly.value = c(3,4,3,5), # Variable category counts
IQ = 0.85,
distribution = "uniform"
)
# Example 3: Binary items (poly.value=2) with high separation
sim_binary <- sim.LCA(N = 300, I = 10, L = 2, poly.value = 2, IQ = 0.9)
Simulate Data for Latent Profile Analysis
Description
Generates synthetic multivariate continuous data from a latent profile model with L latent classes.
Supports flexible covariance structure constraints (including custom equality constraints) and
class size distributions. All covariance matrices are ensured to be positive definite.
Usage
sim.LPA(
N = 1000,
I = 5,
L = 2,
constraint = "VV",
distribution = "random",
mean.range = c(-2, 2),
covs.range = c(0.01, 4),
params = NULL,
is.sort = TRUE
)
Arguments
N |
Integer; total number of observations to simulate. Must be |
I |
Integer; number of continuous observed variables. Must be |
L |
Integer; number of latent profiles (classes). Must be |
constraint |
Character string or list specifying covariance constraints. See detailed description below.
Default is |
distribution |
Character; distribution of class sizes. Options: |
mean.range |
Numeric vector of length 2; range for sampling class-specific means.
Each variable's means are sampled uniformly from |
covs.range |
Numeric vector of length 2; range for sampling variance parameters (diagonal elements).
Must satisfy |
params |
List with fixed parameters for simulation:
|
is.sort |
A logical value. If |
Details
Mean Generation: For each variable, 3L candidate means are sampled uniformly from mean.range.
L distinct means are selected without replacement to ensure separation between classes.
Covariance Generation:
-
Positive Definiteness: All covariance matrices are adjusted using
Matrix::nearPDand eigenvalue thresholds (> 10^{-8}) to guarantee validity. Failed attempts trigger explicit errors. -
Univariate Case (
I=1): Constraints"UE"and"UV"are enforced automatically. Predefined constraints like"E0"map to"UE". -
VE Constraint: Requires special handling—base off-diagonal elements are fixed, and diagonals are sampled above a minimum threshold to maintain positive definiteness. May fail if
covs.rangeis too narrow.
Class Assignment:
-
"random": Uses Dirichlet distribution (\alpha = 3) to avoid extremely small classes. Sizes are rounded and adjusted to sum exactly toN. -
"uniform": Simple random sampling with equal probability. May produce empty classes ifNis small.
Data Generation: Observations are simulated using mvtnorm::rmvnorm per class.
Final data and class labels are shuffled to remove ordering artifacts.
Value
A list containing:
- response
Numeric matrix (
N \times I) of simulated observations. Rows are observations, columns are variables named"V1","V2", ..., or"UV"for univariate data.- means
Numeric matrix (
L \times I) of true class-specific means. Row names:"Class1","Class2", ...; column names matchresponse.- covs
Array (
I \times I \times L) of true class-specific covariance matrices. Dimensions: variables x variables x classes. Constrained parameters have identical values across class slices. Dimension names matchresponseand class labels.- P.Z.Xn
Numeric matrix (
N \times L) of true class membership probabilities (one-hot encoded). Rowi, columnl= 1 if observationibelongs to classl, else 0. Row names:"O1","O2", ...; column names:"Class1","Class2", ...- P.Z
Numeric vector (length
L) of true class proportions. Named with class labels (e.g.,"Class1").- Z
Integer vector (length
N) of true class assignments (1 to L). Named with observation IDs (e.g.,"O1").- constraint
Original constraint specification (character string or list) passed to the function.
Covariance Constraints
The constraint parameter controls equality constraints on covariance parameters across classes:
- Predefined Constraints (Character Strings):
-
"UE"(Univariate only)Equal variance across all classes.
"UV"(Univariate only)Varying variances across classes.
"E0"Equal variances across classes, zero covariances (diagonal matrix with shared variances).
"V0"Varying variances across classes, zero covariances (diagonal matrix with free variances).
"EE"Equal full covariance matrix across all classes (homogeneous).
"EV"Equal variances but varying covariances (equal diagonal, free off-diagonal).
"VE"Varying variances but equal correlations (free diagonal, equal correlation structure).
"VV"Varying full covariance matrices across classes (heterogeneous; default).
- Custom Constraints (List of integer vectors):
-
Each element specifies a pair of variables whose covariance parameters are constrained equal across classes:
c(i,i)Constrains variance of variable
ito be equal across all classes.c(i,j)Constrains covariance between variables
iandjto be equal across all classes (symmetric: automatically includesc(j,i)).
Unconstrained parameters vary freely. The algorithm ensures positive definiteness by:
Generating a base positive definite matrix
S0.Applying constraints via a logical mask.
Adjusting unconstrained variances to maintain positive definiteness.
Critical requirements for custom constraints:
- At least one variance must be unconstrained if any off-diagonal covariance is unconstrained.
- All indices must be between 1 and
I. - For univariate data (
I=1), onlylist(c(1,1))is valid.
Class Size Distribution
"random"(Default) Class proportions drawn from Dirichlet distribution (
\alpha = 3for all classes), ensuring no empty classes. Sizes are rounded to integers with adjustment for exactN."uniform"Equal probability of class membership (
1/Lper class), sampled with replacement.
Examples
# Example 1: Bivariate data, 3 classes, heterogeneous covariances (default)
sim_data <- sim.LPA(N = 500, I = 2, L = 3, constraint = "VV")
# Example 2: Univariate data, equal variances
# 'E0' automatically maps to 'UE' for I=2
sim_uni <- sim.LPA(N = 200, I = 2, L = 2, constraint = "E0")
# Example 3: Custom constraints
# - Equal covariance between V1 and V2 across classes
# - Equal variance for V3 across classes
sim_custom <- sim.LPA(
N = 300,
I = 3,
L = 4,
constraint = list(c(1, 2), c(3, 3))
)
# Example 4: VE constraint (varying variances, equal correlations)
sim_ve <- sim.LPA(N = 400, I = 3, L = 3, constraint = "VE")
# Example 5: Uniform class sizes
sim_uniform <- sim.LPA(N = 300, I = 4, L = 5, distribution = "uniform")
Simulate Data for Latent Transition Analysis (LTA)
Description
Simulates longitudinal latent class/profile data where initial class membership and transition probabilities may be influenced by time-varying covariates. Supports both Latent Class Analysis (LCA) for categorical outcomes and Latent Profile Analysis (LPA) for continuous outcomes. Measurement invariance is assumed by default (identical item parameters across time).
Usage
sim.LTA(
N = 500,
I = 5,
L = 3,
distribution = "random",
times = 2,
type = "LCA",
rate = NULL,
constraint = "VV",
mean.range = c(-2, 2),
covs.range = c(0.01, 4),
poly.value = 5,
IQ = "random",
params = NULL,
is.sort = TRUE,
covariates = NULL,
beta = NULL,
gamma = NULL
)
Arguments
N |
Integer; sample size. |
I |
Integer; number of observed items/indicators per time point. |
L |
Integer; number of latent classes/profiles. |
distribution |
Character; distribution of initial class probabilities when not using covariates or |
times |
Integer; number of time points (must be |
type |
Character; type of latent model. |
rate |
List of matrices or NULL; transition probability matrices for non-covariate mode.
Each matrix is |
constraint |
Character; covariance structure for LPA ( |
mean.range |
Numeric vector; range for randomly generated class means in LPA (default: |
covs.range |
Numeric vector; range for covariance matrix diagonals in LPA (default: |
poly.value |
Integer; number of categories for polytomous LCA items (default: 5). |
IQ |
Character; method for generating item discrimination in LCA. |
params |
List or NULL; pre-specified parameters for reproducibility (see Details). |
is.sort |
A logical value. If |
covariates |
List of matrices or NULL; covariate matrices for each time point. Each matrix must have
dimensions |
beta |
Matrix or NULL; initial state regression coefficients of dimension |
gamma |
List or NULL; transition regression coefficients. Must be a list of length |
Details
Covariate Requirements:
Covariate matrices must include an intercept (first column = 1). If omitted, the function adds an intercept and issues a warning.
When
covariatesis provided butbetaorgammaisNULL, coefficients are randomly generated from\text{Uniform}(-1, 1)(non-reference classes only).The reference class (
L) always has zero coefficients (\boldsymbol{\beta}_L = \mathbf{0},\boldsymbol{\gamma}_{l,L} = \mathbf{0}).
Parameter Compatibility:
Use
paramsto fix item parameters (LCA) or class means/covariances (LPA) across simulations.In non-covariate mode,
ratemust be a list of(times-1)valid transition matrices (ignored whentimes=1).In covariate mode with
times>=2, all three (covariates,beta,gamma) must be consistent in dimensions.
Value
A list of class "sim.LTA" containing:
responsesList of length
times; observed data matrices (N \times I).ZsList of length
times; true latent class memberships (N \times 1vectors).P.ZsList of length
times; marginal class probabilities at each time.parItem parameters for LCA (if
type="LCA").meansClass means for LPA (if
type="LPA").covsClass covariance matrices for LPA (if
type="LPA").rateTrue transition matrices (non-covariate mode only;
NULLwhentimes=1).covariatesList of covariate matrices used (covariate mode only).
betaTrue initial state coefficients (covariate mode only).
gammaTrue transition coefficients (covariate mode only;
NULLwhentimes=1).callFunction call.
argumentsInput arguments.
Model Specification
- Initial Class Probabilities (with covariates):
-
For observation/participant
nat time 1, the probability of belonging to latent classlis:P(Z_{n1} = l \mid \mathbf{X}_{n1}) = \frac{\exp(\boldsymbol{\beta}_l^\top \mathbf{X}_{n1})} {\sum_{k=1}^L \exp(\boldsymbol{\beta}_k^\top \mathbf{X}_{n1})}where
\mathbf{X}_{n1} = (X_{n10}, X_{n11}, \dots, X_{n1M})^\topis the covariate vector for observation/participantnat time 1, withX_{n10} = 1(intercept term) andX_{n1m}(m=1,\dots,M) representing the value of them-th covariate. The coefficient vector\boldsymbol{\beta}_l = (\beta_{l0}, \beta_{l1}, \dots, \beta_{lM})^\topcorresponds element-wise to\mathbf{X}_{n1}, where\beta_{l0}is the intercept and\beta_{lm}(m \geq 1) are regression coefficients for covariates. ClassLis the reference class (\boldsymbol{\beta}_L = \mathbf{0}). - Transition Probabilities (with covariates and times>=2):
-
For observation/participant
ntransitioning from classlat timet-1to classkat timet(t \geq 2):P(Z_{nt} = k \mid Z_{n,t-1} = l, \mathbf{X}_{nt}) = \frac{\exp(\boldsymbol{\gamma}_{lkt}^\top \mathbf{X}_{nt})} {\sum_{j=1}^L \exp(\boldsymbol{\gamma}_{ljt}^\top \mathbf{X}_{nt})}where
\mathbf{X}_{nt} = (X_{nt0}, X_{nt1}, \dots, X_{ntM})^\topis the covariate vector at timet, withX_{nt0} = 1(intercept) andX_{ntm}(m=1,\dots,M) as them-th covariate value. The coefficient vector\boldsymbol{\gamma}_{lkt} = (\gamma_{lkt0}, \gamma_{lkt1}, \dots, \gamma_{lktM})^\topcorresponds element-wise to\mathbf{X}_{nt}, where\gamma_{lkt0}is the intercept and\gamma_{lktm}(m \geq 1) are regression coefficients. ClassLis the reference class (\boldsymbol{\gamma}_{lLt} = \mathbf{0}for alll). - Without Covariates or When times=1:
-
Initial probabilities follow a multinomial distribution with probabilities
\boldsymbol{\pi} = (\pi_1, \dots, \pi_L). Whentimes \geq 2, transitions follow a Markov process with fixed probabilities\tau_{lk}^{(t)} = P(Z_t = k \mid Z_{t-1} = l), where\sum_{k=1}^L \tau_{lk}^{(t)} = 1for eachlandt.
Examples
####################### Example 1: Single time point (times=1) ######################
library(LCPA)
set.seed(123)
sim_single <- sim.LTA(N = 200, I = 4, L = 3, times = 1, type = "LCA")
print(sim_single)
####################### Example 2: LPA without covariates ######################
set.seed(123)
sim_lta <- sim.LTA(N = 200, I = 3, L = 3, times = 3, type = "LPA", constraint = "VE")
print(sim_lta)
################## Example 3: With custom covariates (times>=2) ######################
set.seed(123)
N <- 200 ## sample size
## Covariates at time point T1
covariates.inter <- rep(1, N) # Intercept term is always 1 for each n
covariates.X1 <- rnorm(N) # Covariate X1 is a continuous variable
covariates.X2 <- rbinom(N, 1, 0.5) # Covariate X2 is a binary variable
covariates.X1.X2 <- covariates.X1 * covariates.X2 # Interaction between covariates X1 and X2
covariates.T1 <- cbind(inter=covariates.inter, X1=covariates.X1,
X2=covariates.X2, X1.X2=covariates.X1.X2) # Combine into covariates at T1
## Covariates at time point T2
covariates.inter <- rep(1, N) # Intercept term is always 1 for each n
covariates.X1 <- rnorm(N) # Covariate X1 is a continuous variable
covariates.X2 <- rbinom(N, 1, 0.5) # Covariate X2 is a binary variable
covariates.X1.X2 <- covariates.X1 * covariates.X2 # Interaction between covariates X1 and X2
covariates.T2 <- cbind(inter=covariates.inter, X1=covariates.X1,
X2=covariates.X2, X1.X2=covariates.X1.X2) # Combine into covariates at T2
covariates <- list(t1=covariates.T1, t2=covariates.T2) # Combine into final covariates list
## Simulate beta coefficients
# 3x3 matrix (last column is zero because the last category is used as reference)
beta <- matrix(c( 0.8, -0.5, 0.0,
-0.3, -0.4, 0.0,
0.2, 0.8, 0.0,
-0.1, 0.2, 0.0), ncol=3, byrow=TRUE)
## Simulate gamma coefficients (only needed when times>=2)
gamma <- list(
lapply(1:3, function(l) {
lapply(1:3, function(k) if(k < 3)
runif(4, -1.0, 1.0) else c(0, 0, 0, 0)) # Last class as reference
})
)
## Simulate the data
sim_custom <- sim.LTA(
N=N, I=4, L=3, times=2, type="LPA",
covariates=covariates,
beta=beta,
gamma=gamma
)
summary(sim_custom)
Generate a Random Correlation Matrix via C-Vine Partial Correlations
Description
This function generates a random I \times I correlation matrix using the C-vine partial correlation
parameterization described in Joe & Kurowicka (2026). The method constructs the matrix recursively using
partial correlations organized in a C-vine structure, with distributional properties controlled by LKJ
concentration and skewness parameters.
Usage
sim.correlation(
I,
eta = 1,
skew = 0,
positive = FALSE,
permute = TRUE,
maxiter = 10
)
Arguments
I |
Dimension of the correlation matrix (must be |
eta |
LKJ concentration parameter ( |
skew |
Skewness parameter (
|
positive |
Logical. If |
permute |
Logical. If |
maxiter |
Integer. Maximum number of generation attempts before numerical adjustment when |
Details
The algorithm follows four key steps:
Partial correlation sampling: For tree level
k = 1, \dots, I-1and nodej = k+1, \dots, I, partial correlations\rho_{k,j \mid 1:(k-1)}are sampled as:\alpha_k = \eta + \frac{I - k - 1}{2}, \quad a_k = \alpha_k (1 + \text{skew}), \quad b_k = \alpha_k (1 - \text{skew})If
positive = FALSE:\rho_{k,j} \sim 2 \cdot \mathrm{Beta}(a_k, b_k) - 1If
positive = TRUE:\rho_{k,j} \sim \mathrm{Beta}(a_k, b_k)
Recursive matrix construction (C-vine): The correlation matrix
\mathbf{R}is built without matrix inversion using backward recursion:Tree 1 (raw correlations):
R_{1j} = \rho_{1,j}forj = 2,\dots,ITrees
l \geq 2: For pairs(l,j)wherel = 2,\dots,I-1andj = l+1,\dots,I:c \gets \rho_{l,j \mid 1:(l-1)} \\ \text{for } k = l-1 \text{ down to } 1: \\ \quad c \gets c \cdot \sqrt{(1 - \rho_{k,l}^2)(1 - \rho_{k,j}^2)} + \rho_{k,l} \cdot \rho_{k,j} \\ R_{lj} \gets c
This implements the dynamic programming approach from Joe & Kurowicka (2026, Section 2.1).
Positive definiteness enforcement (when
positive = TRUE):Attempt up to
maxitergenerationsOn failure, project to nearest positive definite correlation matrix using
nearPDwithcorr = TRUEFinal matrix has minimum eigenvalue > 1e-8
Exchangeability (optional): If
permute = TRUE, rows/columns are randomly permuted before returning the matrix.
Value
An I \times I positive definite correlation matrix with unit diagonal.
Note
When positive = TRUE, the function guarantees positive definiteness either through direct generation
(with retries) or numerical projection. The theoretical guarantee \eta > (I-2)/2 is recommended for high dimensions.
References
Joe, H., & Kurowicka, D. (2026). Random correlation matrices generated via partial correlation C-vines. Journal of Multivariate Analysis, 211, 105519. https://doi.org/10.1016/j.jmva.2025.105519
Examples
# Default 3x3 correlation matrix
sim.correlation(3)
# 5x5 matrix concentrated near identity (eta=3)
sim.correlation(5, eta = 3)
# Skewed toward positive correlations (no permutation)
sim.correlation(4, skew = 0.7, permute = FALSE)
# Positive partial correlations (enforced positive definiteness)
R <- sim.correlation(6, positive = TRUE)
min(eigen(R, symmetric = TRUE, only.values = TRUE)$values) # > 1e-8
# High-dimensional case (I=20) with theoretical guarantee
R <- sim.correlation(20, eta = 10) # eta=10 > (20-2)/2=9
min(eigen(R, symmetric = TRUE, only.values = TRUE)$values)
S3 Methods: summary
Description
Generates structured, comprehensive summaries of objects produced by the LCPA package.
This generic function dispatches to class-specific methods that extract and organize key information
including model configurations, fit statistics, parameter estimates, simulation truths, and diagnostics.
Designed for programmatic access and downstream reporting.
Usage
## S3 method for class 'LCA'
summary(object, digits = 4, I.max = 5, ...)
## S3 method for class 'LPA'
summary(object, digits = 4, I.max = 5, ...)
## S3 method for class 'LTA'
summary(object, digits = 4, I.max = 5, ...)
## S3 method for class 'LCPA'
summary(object, digits = 4, I.max = 5, ...)
## S3 method for class 'sim.LCA'
summary(object, digits = 4, I.max = 5, ...)
## S3 method for class 'sim.LPA'
summary(object, digits = 4, I.max = 5, ...)
## S3 method for class 'sim.LTA'
summary(object, digits = 4, I.max = 5, L.max = 5, ...)
## S3 method for class 'fit.index'
summary(object, digits = 4, ...)
## S3 method for class 'compare.model'
summary(object, digits = 4, ...)
## S3 method for class 'SE'
summary(object, ...)
Arguments
object |
An object of one of the following classes:
|
digits |
Number of decimal places for numeric output (default: 4). Applied universally across all methods. |
I.max |
Maximum number of variables/items to display ( |
... |
Additional arguments passed to or from other methods (currently ignored). |
L.max |
Maximum number of latent classes/profiles to display before truncation ( |
Details
Each method returns a named list with class-specific components optimized for structured access:
LCAReturns a
summary.LCAobject with components:callOriginal function call.
model.configList:
latent_classes,method.data.infoList:
N,I,poly.value,uniform_categories.fit.statsList:
LogLik,AIC,BIC,entropy,npar.class.probsData frame:
Class,Count,Proportion.item.probsList of matrices (first
I.maxitems) with conditional probabilities per class/category.convergenceList: algorithm, iterations, tolerance, loglik change, hardware (if applicable).
replicationList:
nrep,best_BIC(if multiple replications performed).digits,I.max.shown,total.itemsMetadata for printing/formatting.
LPAReturns a
summary.LPAobject with components:callOriginal function call.
model.configList:
latent_profiles,constraint,cov_structure,method.data.infoList:
N,I,distribution.fit.statsList:
LogLik,AIC,BIC,entropy,npar.class.probsData frame:
Profile,Count,Proportion.class.meansMatrix (first
I.maxvariables) of profile-specific means.convergenceList: algorithm, iterations, tolerance, loglik change, hardware (if applicable).
replicationList:
nrep,best_BIC(if multiple replications performed).digits,I.max.shown,total.varsMetadata for printing/formatting.
LCPAReturns a
summary.LCPAobject with components:callOriginal function call.
model.configList:
latent_classes,model_type,reference_class,covariates_mode,CEP_handling.data.infoList:
sample_size,variables.fit.statsList:
LogLik,AIC,BIC,npar.class.probsData frame:
Class,Probability,Proportion,Frequency.coefficientsData frame: regression coefficients for non-reference classes (Estimate, Std_Error, z_value, p_value, 95% CI).
reference_classInteger: reference class for multinomial logit.
convergenceList:
iterations,coveraged,converg_note.digits,I.max.shown,total.vars,has.covariatesMetadata for printing/formatting.
LTAReturns a
summary.LTAobject with components:callOriginal function call.
model.configList:
time_points,latent_classes,model_type,reference_class,covariates_mode,CEP_handling,transition_mode.data.infoList:
sample_size,variables,time_points.fit.statsList:
LogLik,AIC,BIC,npar.class.probsList of data frames (per time point):
Class,Probability,Proportion,Frequency.initial_modelList:
coefficients(data frame),covariate_names,reference_class.transition_modelsNamed list of data frames: transition coefficients per time interval (From_Class, To_Class, Estimate, Std_Error, etc.).
reference_classInteger: reference destination class for transitions.
convergenceList:
iterations,coveraged,converg_note.digits,I.max.shown,total.vars,covariates.timeCrossMetadata for printing/formatting.
sim.LCAReturns a
summary.sim.LCAobject with components:callOriginal simulation call.
configList:
N,I,L,poly.value,uniform_categories,IQ,distribution.class.probsData frame:
Class,Probability,Frequency.item.probsList of matrices (first
I.maxitems) with true conditional probabilities per class/category.digits,I.max.shown,total.varsMetadata for printing/formatting.
sim.LPAReturns a
summary.sim.LPAobject with components:callOriginal simulation call.
configList:
N,I,L,constraint,constraint_desc,distribution.class.probsData frame:
Profile,Probability,Frequency.class.meansMatrix (first
I.maxvariables) of true profile-specific means.cov_structureCharacter: detailed description of covariance constraints.
digits,I.max.shown,total.varsMetadata for printing/formatting.
sim.LTAReturns a
summary.sim.LTAobject with components:callOriginal simulation call.
configList:
N,I,L,times,type,distribution,constraint(if LPA).class.probsList of data frames (per time point):
Class,Probability,Frequency.item.probsNested list (by time/item) of true conditional probabilities (if
type="LCA").class.meansList of matrices (by time) of true profile means (if
type="LPA").transitionList:
mode("fixed" or "covariate"),rateorbeta/gammacoefficients,time_points.covariatesList of data frames (per time point) with covariate summaries (Min, Max, Mean), if present.
digits,I.max.shown,L.max.shown,total.vars,total.classesMetadata for printing/formatting.
fit.indexReturns a
summary.fit.indexobject with components:callFunction call that generated the fit indices.
data.infoList:
N.fit.tableData frame:
Statistic,Value,Descriptionfor -2LL, AIC, BIC, SIC, CAIC, AWE, SABIC.digitsNumeric: precision used for formatting.
compare.modelReturns a
summary.compare.modelobject with components:callFunction call that generated the comparison.
data.infoList:
N,I,L(named vector for two models).fit.tableData frame comparing fit indices for both models.
model_comparisonData frame:
Classes,npar,AvePP,Entropy.BFNumeric: Bayes Factor value (if computed).
BF_interpretationCharacter: interpretive guidance for Bayes Factor.
lrt_tableData frame:
Test,Statistic,DF,p-value,Sig(significance markers).lrt_objectsList: raw hypothesis test objects for further inspection.
digitsNumeric: precision used for formatting.
SEReturns a
summary.SEobject with components:callOriginal function call.
methodCharacter: "Obs" or "Bootstrap".
diagnosticsList: method-specific diagnostic info (e.g., n.Bootstrap, hessian_cond_number).
model_typeCharacter: "LCA" or "LPA".
LInteger: number of latent classes/profiles.
IInteger: number of variables/items (NA if unknown).
nonzero_countsList: counts of non-zero SEs by parameter type (P.Z, means/par, covs).
total_PZInteger: total number of class probability parameters.
Value
Invisibly returns a structured list containing summary components.
The exact structure depends on the class of object. All returned objects carry an appropriate
S3 class (e.g., summary.LCA, summary.LPA) for use with corresponding print methods.
Methods (by class)
-
summary(LCA): Summary method forLCAobjects -
summary(LPA): Summary method forLPAobjects -
summary(LTA): Summary method forLTAobjects -
summary(LCPA): Summary method forLCPAobjects -
summary(sim.LCA): Summary method forsim.LCAobjects -
summary(sim.LPA): Summary method forsim.LPAobjects -
summary(sim.LTA): Summary method forsim.LTAobjects -
summary(fit.index): Summary method forfit.indexobjects -
summary(compare.model): Summary method forcompare.modelobjects -
summary(SE): Summary method forsummary.SEobjects
S3 Methods: update
Description
The update function provides a unified and convenient interface to refresh or modify
existing objects generated by the LCPA package. It allows users to re-run model fitting
or data simulation with new parameter settings while preserving all other original configurations.
Supported classes include: LCA, LPA,
LCPA, LTA, sim.LCA,
sim.LPA, and sim.LTA.
Usage
update(x, ...)
## S3 method for class 'LCA'
update(x, ...)
## S3 method for class 'LPA'
update(x, ...)
## S3 method for class 'LCPA'
update(x, ...)
## S3 method for class 'LTA'
update(x, ...)
## S3 method for class 'sim.LCA'
update(x, ...)
## S3 method for class 'sim.LPA'
update(x, ...)
## S3 method for class 'sim.LTA'
update(x, ...)
Arguments
x |
An object of one of the following classes: |
... |
Additional named arguments passed to override or extend the original call.
Valid arguments depend on the class of
|
Details
Internally, each method extracts the stored arguments list from the input object x,
merges it with user-provided ... using modifyList, then re-invokes
the corresponding constructor function (LCA(), LPA(), LCPA(), LTA(),
sim.LCA(), etc.) with the merged argument list.
This ensures that:
Only explicitly overridden parameters are changed.
Default values from the original call remain intact.
Complex nested structures (e.g., control lists) can be partially updated.
Note: If an invalid argument is passed (e.g., constraint to LCA), it will be silently ignored
unless the underlying constructor validates inputs.
Value
An object of the same class as x, reconstructed using the original arguments
updated with any provided in .... All unchanged parameters are preserved from the original call.
Methods (by class)
-
update(LCA): Update method forLCAobjects -
update(LPA): Update method forLPAobjects -
update(LCPA): Update method forLCPAobjects -
update(LTA): Update method forLTAobjects -
update(sim.LCA): Update method forsim.LCAobjects -
update(sim.LPA): Update method forsim.LPAobjects -
update(sim.LTA): Update method forsim.LTAobjects
Examples
library(LCPA)
# --- Update LCA ---
data <- sim.LCA(N=500, I=5, L=3)
lca.obj <- LCA(data$response, L=3)
lca.updated <- update(lca.obj, method="EM", nrep=5)
# --- Update LPA ---
data2 <- sim.LPA(N=300, I=4, L=2)
lpa.obj <- LPA(data2$response, L=2, constraint="VE")
lpa.updated <- update(lpa.obj, constraint="VV")
# --- Update Simulation Objects ---
sim.obj1 <- sim.LCA(N=1000)
sim.obj1_updated <- update(sim.obj1, N=2000, IQ=0.8)
sim.obj2 <- sim.LPA(I=6)
sim.obj2_updated <- update(sim.obj2, I=8, mean.range=c(-2,2))
sim.obj3 <- sim.LTA(N=200, I=5, L=2, times=3)
sim.obj3_updated <- update(sim.obj3, N=300, times=4, constraint="ER")