1 Introduction

This case study demonstrates a complete analysis pipeline using missoNet for genomic data analysis, specifically for regional DNA methylation QTL (meQTL) mapping. We’ll analyze how genetic variants (SNPs) influence DNA methylation levels across multiple CpG sites while accounting for missing data and network structure among CpG sites.

1.1 Scientific Context

In epigenome-wide association studies (EWAS):

Predictors: Single nucleotide polymorphisms (SNPs)
Responses: DNA methylation levels at CpG sites
Challenge: Missing methylation measurements due to technical issues
Goal: Identify genetic regulation of methylation and CpG co-regulation patterns

2 Data Generation and Preparation

# Load required packages
library(missoNet)
library(ggplot2)
library(reshape2)
library(gridExtra)

# Set ggplot theme
theme_set(theme_minimal(base_size = 11))

2.1 Simulating Realistic Genomic Data

# Set dimensions mimicking a genomic region
n <- 600   # Number of samples
p <- 80   # Number of SNPs in the region
q <- 20    # Number of CpG sites

# Create realistic correlation structure
# SNPs: Linkage disequilibrium (LD) blocks
create_ld_structure <- function(p, n_blocks = 4, within_block_cor = 0.7) {
  Sigma <- matrix(0, p, p)
  block_size <- p / n_blocks
  
  for (b in 1:n_blocks) {
    idx <- ((b-1)*block_size + 1):(b*block_size)
    for (i in idx) {
      for (j in idx) {
        Sigma[i, j] <- within_block_cor^abs(i - j)
      }
    }
  }
  diag(Sigma) <- 1
  return(Sigma)
}

Sigma.X <- create_ld_structure(p, n_blocks = 4)

# Variable missing rates (technical dropout patterns)
# Higher missingness for CpG sites with extreme GC content
gc_content <- runif(q, 0.3, 0.7)  # Simulated GC content
rho_vec <- 0.01 + 0.3 * (abs(gc_content - 0.5) / 0.2)^2  # U-shaped missing pattern

# Generate the data
sim <- generateData(
  n = n,
  p = p,
  q = q,
  rho = rho_vec,  # Missing rates
  missing.type = "MAR",  # Missing depends on technical factors
  Sigma.X = Sigma.X,
  Beta.row.sparsity = 0.15,  # 15% of SNPs are meQTLs
  Beta.elm.sparsity = 0.4,     # Each meQTL affects 40% of CpGs
  seed = 100
)

# Add meaningful variable names
colnames(sim$X) <- sprintf("rs%d", 1000000 + 1:p)  # SNP IDs
colnames(sim$Y) <- sprintf("cg%d", 2000000 + 1:q)  # CpG IDs
colnames(sim$Z) <- colnames(sim$Y)
rownames(sim$X) <- rownames(sim$Y) <- rownames(sim$Z) <- paste0("Sample", 1:n)

{
  cat("\nDataset Summary:\n")
  cat("================\n")
  cat("Samples:", n, "\n")
  cat("SNPs:", p, "\n")
  cat("CpG sites:", q, "\n")
  cat("Overall missing rate:", sprintf("%.1f%%", mean(is.na(sim$Z)) * 100), "\n")
  cat("True meQTLs:", sum(rowSums(abs(sim$Beta)) > 0), "\n")
}
#> 
#> Dataset Summary:
#> ================
#> Samples: 600 
#> SNPs: 80 
#> CpG sites: 20 
#> Overall missing rate: 9.4% 
#> True meQTLs: 12

3 Exploratory Data Analysis

3.1 Missing Data Patterns

# Analyze missing patterns
miss_by_cpg <- colMeans(is.na(sim$Z))
miss_by_sample <- rowMeans(is.na(sim$Z))

# Create visualization
par(mfrow = c(2, 2))

# 1. Missing rate by CpG
plot(miss_by_cpg, type = "h", lwd = 2, col = "steelblue",
     xlab = "CpG Site", ylab = "Missing Rate",
     main = "Missing Data by CpG Site")
abline(h = mean(miss_by_cpg), col = "red", lty = 2)

# 2. Missing rate by sample
hist(miss_by_sample, breaks = 20, col = "lightblue",
     xlab = "Missing Rate", main = "Distribution of Missing Rates (Samples)")
abline(v = mean(miss_by_sample), col = "red", lwd = 2)

# 3. Heatmap of missingness
image(t(is.na(sim$Z[1:100, ])), col = c("white", "darkred"),
      xlab = "CpG Site", ylab = "Sample (first 100)",
      main = "Missing Data Pattern")

# 4. Correlation of missingness with GC content
plot(gc_content, miss_by_cpg, pch = 19, col = "darkblue",
     xlab = "GC Content", ylab = "Missing Rate",
     main = "Technical Dropout vs GC Content")
lines(lowess(gc_content, miss_by_cpg), col = "red", lwd = 2)

4 Methylation Correlation Structure

# Use complete data for visualization
Y_complete <- sim$Y
cor_cpg <- cor(Y_complete)

# Create enhanced heatmap
library(RColorBrewer)
colors <- colorRampPalette(brewer.pal(9, "RdBu"))(100)

heatmap(cor_cpg, 
        col = rev(colors),
        symm = TRUE,
        main = "CpG Methylation Correlation Structure",
        xlab = "CpG Sites", ylab = "CpG Sites",
        margins = c(8, 8))


# Identify CpG modules
hc <- hclust(as.dist(1 - abs(cor_cpg)))
modules <- cutree(hc, k = 3)

cat("\nCpG modules identified:", table(modules), "\n")
#> 
#> CpG modules identified: 11 5 4

5 Model Fitting

5.1 Initial Parameter Selection

fit_initial <- missoNet(
  X = sim$X,
  Y = sim$Z,
  GoF = "BIC",
  adaptive.search = TRUE,  # Fast exploration
  verbose = 1
)
#> 
#> =============================================================
#>                           missoNet
#> =============================================================
#> 
#> > Initializing model...
#> 
#> --- Model Configuration -------------------------------------
#>   Data dimensions:      n =   600, p =   80, q =   20
#>   Missing rate (avg):     9.4%
#>   Selection criterion:  BIC
#>   Lambda grid:          adaptive (fast pre-test)
#> 
#>   - Stage: 1/2 (Searching lambda.beta)
#>   |                                                          |                                                  |   0%  |                                                          |=                                                 |   2%  |                                                          |==                                                |   5%  |                                                          |====                                              |   8%  |                                                          |=====                                             |  10%  |                                                          |======                                            |  12%  |                                                          |========                                          |  15%  |                                                          |=========                                         |  18%  |                                                          |==========                                        |  20%  |                                                          |===========                                       |  22%  |                                                          |============                                      |  25%  |                                                          |==============                                    |  28%  |                                                          |===============                                   |  30%  |                                                          |================                                  |  32%  |                                                          |==================                                |  35%  |                                                          |===================                               |  38%  |                                                          |====================                              |  40%  |                                                          |=====================                             |  42%  |                                                          |======================                            |  45%  |                                                          |========================                          |  48%  |                                                          |=========================                         |  50%  |                                                          |==========================                        |  52%  |                                                          |============================                      |  55%  |                                                          |=============================                     |  58%  |                                                          |==============================                    |  60%  |                                                          |===============================                   |  62%  |                                                          |================================                  |  65%  |                                                          |==================================                |  68%  |                                                          |===================================               |  70%  |                                                          |====================================              |  72%  |                                                          |======================================            |  75%  |                                                          |=======================================           |  78%  |                                                          |========================================          |  80%  |                                                          |=========================================         |  82%  |                                                          |==========================================        |  85%  |                                                          |============================================      |  88%  |                                                          |=============================================     |  90%  |                                                          |==============================================    |  92%  |                                                          |================================================  |  95%  |                                                          |================================================= |  98%  |                                                          |==================================================| 100%
#>   - Stage: 2/2 (Searching lambda.theta)
#>   |                                                          |                                                  |   0%  |                                                          |=                                                 |   2%  |                                                          |==                                                |   4%  |                                                          |===                                               |   6%  |                                                          |====                                              |   8%  |                                                          |=====                                             |  10%  |                                                          |======                                            |  12%  |                                                          |=======                                           |  14%  |                                                          |========                                          |  16%  |                                                          |=========                                         |  18%  |                                                          |==========                                        |  20%  |                                                          |===========                                       |  22%  |                                                          |============                                      |  24%  |                                                          |=============                                     |  26%  |                                                          |==============                                    |  28%  |                                                          |===============                                   |  30%  |                                                          |================                                  |  32%  |                                                          |=================                                 |  34%  |                                                          |==================                                |  36%  |                                                          |===================                               |  38%  |                                                          |====================                              |  40%  |                                                          |=====================                             |  42%  |                                                          |======================                            |  44%  |                                                          |=======================                           |  46%  |                                                          |========================                          |  48%  |                                                          |=========================                         |  50%  |                                                          |==========================                        |  52%  |                                                          |===========================                       |  54%  |                                                          |============================                      |  56%  |                                                          |=============================                     |  58%  |                                                          |==============================                    |  60%  |                                                          |===============================                   |  62%  |                                                          |================================                  |  64%  |                                                          |=================================                 |  66%  |                                                          |==================================                |  68%  |                                                          |===================================               |  70%  |                                                          |====================================              |  72%  |                                                          |=====================================             |  74%  |                                                          |======================================            |  76%  |                                                          |=======================================           |  78%  |                                                          |========================================          |  80%  |                                                          |=========================================         |  82%  |                                                          |==========================================        |  84%  |                                                          |===========================================       |  86%  |                                                          |============================================      |  88%  |                                                          |=============================================     |  90%  |                                                          |==============================================    |  92%  |                                                          |===============================================   |  94%  |                                                          |================================================  |  96%  |                                                          |================================================= |  98%  |                                                          |==================================================| 100%
#> 
#>   Lambda grid size:     14 x 40 = 560 models
#> -------------------------------------------------------------
#> 
#> --- Optimization Progress -----------------------------------
#>   Stage 1: Initializing warm starts
#>   Stage 2: Grid search (sequential)
#> -------------------------------------------------------------
#> 
#>   |                                                          |                                                  |   0%  |                                                          |                                                  |   1%  |                                                          |=                                                 |   1%  |                                                          |=                                                 |   2%  |                                                          |=                                                 |   3%  |                                                          |==                                                |   3%  |                                                          |==                                                |   4%  |                                                          |==                                                |   5%  |                                                          |===                                               |   5%  |                                                          |===                                               |   6%  |                                                          |===                                               |   7%  |                                                          |====                                              |   7%  |                                                          |====                                              |   8%  |                                                          |====                                              |   9%  |                                                          |=====                                             |   9%  |                                                          |=====                                             |  10%  |                                                          |=====                                             |  11%  |                                                          |======                                            |  11%  |                                                          |======                                            |  12%  |                                                          |======                                            |  13%  |                                                          |=======                                           |  13%  |                                                          |=======                                           |  14%  |                                                          |=======                                           |  15%  |                                                          |========                                          |  15%  |                                                          |========                                          |  16%  |                                                          |========                                          |  17%  |                                                          |=========                                         |  17%  |                                                          |=========                                         |  18%  |                                                          |=========                                         |  19%  |                                                          |==========                                        |  19%  |                                                          |==========                                        |  20%  |                                                          |==========                                        |  21%  |                                                          |===========                                       |  21%  |                                                          |===========                                       |  22%  |                                                          |===========                                       |  23%  |                                                          |============                                      |  23%  |                                                          |============                                      |  24%  |                                                          |============                                      |  25%  |                                                          |=============                                     |  25%  |                                                          |=============                                     |  26%  |                                                          |=============                                     |  27%  |                                                          |==============                                    |  27%  |                                                          |==============                                    |  28%  |                                                          |==============                                    |  29%  |                                                          |===============                                   |  29%  |                                                          |===============                                   |  30%  |                                                          |===============                                   |  31%  |                                                          |================                                  |  31%  |                                                          |================                                  |  32%  |                                                          |================                                  |  33%  |                                                          |=================                                 |  33%  |                                                          |=================                                 |  34%  |                                                          |=================                                 |  35%  |                                                          |==================                                |  35%  |                                                          |==================                                |  36%  |                                                          |==================                                |  37%  |                                                          |===================                               |  37%  |                                                          |===================                               |  38%  |                                                          |===================                               |  39%  |                                                          |====================                              |  39%  |                                                          |====================                              |  40%  |                                                          |====================                              |  41%  |                                                          |=====================                             |  41%  |                                                          |=====================                             |  42%  |                                                          |=====================                             |  43%  |                                                          |======================                            |  43%  |                                                          |======================                            |  44%  |                                                          |======================                            |  45%  |                                                          |=======================                           |  45%  |                                                          |=======================                           |  46%  |                                                          |=======================                           |  47%  |                                                          |========================                          |  47%  |                                                          |========================                          |  48%  |                                                          |========================                          |  49%  |                                                          |=========================                         |  49%  |                                                          |=========================                         |  50%  |                                                          |=========================                         |  51%  |                                                          |==========================                        |  51%  |                                                          |==========================                        |  52%  |                                                          |==========================                        |  53%  |                                                          |===========================                       |  53%  |                                                          |===========================                       |  54%  |                                                          |===========================                       |  55%  |                                                          |============================                      |  55%  |                                                          |============================                      |  56%  |                                                          |============================                      |  57%  |                                                          |=============================                     |  57%  |                                                          |=============================                     |  58%  |                                                          |=============================                     |  59%  |                                                          |==============================                    |  59%  |                                                          |==============================                    |  60%  |                                                          |==============================                    |  61%  |                                                          |===============================                   |  61%  |                                                          |===============================                   |  62%  |                                                          |===============================                   |  63%  |                                                          |================================                  |  63%  |                                                          |================================                  |  64%  |                                                          |================================                  |  65%  |                                                          |=================================                 |  65%  |                                                          |=================================                 |  66%  |                                                          |=================================                 |  67%  |                                                          |==================================                |  67%  |                                                          |==================================                |  68%  |                                                          |==================================                |  69%  |                                                          |===================================               |  69%  |                                                          |===================================               |  70%  |                                                          |===================================               |  71%  |                                                          |====================================              |  71%  |                                                          |====================================              |  72%  |                                                          |====================================              |  73%  |                                                          |=====================================             |  73%  |                                                          |=====================================             |  74%  |                                                          |=====================================             |  75%  |                                                          |======================================            |  75%  |                                                          |======================================            |  76%  |                                                          |======================================            |  77%  |                                                          |=======================================           |  77%  |                                                          |=======================================           |  78%  |                                                          |=======================================           |  79%  |                                                          |========================================          |  79%  |                                                          |========================================          |  80%  |                                                          |========================================          |  81%  |                                                          |=========================================         |  81%  |                                                          |=========================================         |  82%  |                                                          |=========================================         |  83%  |                                                          |==========================================        |  83%  |                                                          |==========================================        |  84%  |                                                          |==========================================        |  85%  |                                                          |===========================================       |  85%  |                                                          |===========================================       |  86%  |                                                          |===========================================       |  87%  |                                                          |============================================      |  87%  |                                                          |============================================      |  88%  |                                                          |============================================      |  89%  |                                                          |=============================================     |  89%  |                                                          |=============================================     |  90%  |                                                          |=============================================     |  91%  |                                                          |==============================================    |  91%  |                                                          |==============================================    |  92%  |                                                          |==============================================    |  93%  |                                                          |===============================================   |  93%  |                                                          |===============================================   |  94%  |                                                          |===============================================   |  95%  |                                                          |================================================  |  95%  |                                                          |================================================  |  96%  |                                                          |================================================  |  97%  |                                                          |================================================= |  97%  |                                                          |================================================= |  98%  |                                                          |================================================= |  99%  |                                                          |==================================================|  99%  |                                                          |==================================================| 100%
#> 
#> -------------------------------------------------------------
#> 
#> > Refitting optimal model ...
#> 
#> 
#> --- Optimization Results ------------------------------------
#>   Optimal lambda.beta:   4.3508e-01
#>   Optimal lambda.theta:  2.0119e-02
#>   BIC value:              16287.4096
#>   Active predictors:     66 / 80 (82.5%)
#>   Network edges:         46 / 190 (24.2%)
#> -------------------------------------------------------------
#> 
#> =============================================================

# Examine initial selection
{
  cat("\nStep 1: Initial parameter exploration\n")
  cat("=====================================\n")
  cat("  Lambda.beta:", fit_initial$est.min$lambda.beta, "\n")
  cat("  Lambda.theta:", fit_initial$est.min$lambda.theta, "\n")
  cat("  Active SNPs:", sum(rowSums(abs(fit_initial$est.min$Beta)) > 1e-8), "\n")
  cat("  Network edges:", 
      sum(abs(fit_initial$est.min$Theta[upper.tri(fit_initial$est.min$Theta)]) > 1e-8), "\n")
}
#> 
#> Step 1: Initial parameter exploration
#> =====================================
#>   Lambda.beta: 0.4350844 
#>   Lambda.theta: 0.0201192 
#>   Active SNPs: 66 
#>   Network edges: 46

5.2 Cross-Validation

# Define refined grid based on initial exploration
lambda.beta.refined <- 10^(seq(
  log10(min(max(fit_initial$lambda.beta.seq) * 0.9, fit_initial$est.min$lambda.beta * 50)),
  log10(max(max(fit_initial$lambda.beta.seq) * 0.005, fit_initial$est.min$lambda.beta / 20)),
  length.out = 25
))

lambda.theta.refined <- 10^(seq(
  log10(min(max(fit_initial$lambda.theta.seq) * 0.9, fit_initial$est.min$lambda.theta * 50)),
  log10(max(max(fit_initial$lambda.theta.seq) * 0.005, fit_initial$est.min$lambda.theta / 20)),
  length.out = 25
))

# Perform 5-fold cross-validation
cvfit <- cv.missoNet(
  X = sim$X,
  Y = sim$Z,
  kfold = 5,
  lambda.beta = lambda.beta.refined,
  lambda.theta = lambda.theta.refined,
  compute.1se = TRUE,
  verbose = 0,
  seed = 1000
)

5.3 Model Comparison

# Compare different model choices
models <- list(
  "CV Minimum" = cvfit$est.min,
  "1SE Beta" = cvfit$est.1se.beta,
  "1SE Theta" = cvfit$est.1se.theta,
  "Initial BIC" = fit_initial$est.min
)

if (!is.null(models$`1SE Beta`) & !is.null(models$`1SE Theta`)) {  # Ensure models exist
  comparison <- data.frame(
    Model = names(models),
    Lambda.Beta = sapply(models, function(x) x$lambda.beta),
    Lambda.Theta = sapply(models, function(x) x$lambda.theta),
    Active.SNPs = sapply(models, function(x) 
      sum(rowSums(abs(x$Beta)) > 1e-8)),
    Total.Effects = sapply(models, function(x)
      sum(abs(x$Beta) > 1e-8)),
    Network.Edges = sapply(models, function(x)
      sum(abs(x$Theta[upper.tri(x$Theta)]) > 1e-8))
  )
  print(comparison, digits = 4)
}
#>                   Model Lambda.Beta Lambda.Theta Active.SNPs Total.Effects Network.Edges
#> CV Minimum   CV Minimum      0.2328      0.03066          79           458            28
#> 1SE Beta       1SE Beta      0.4023      0.03066          70           249            29
#> 1SE Theta     1SE Theta      0.2328      0.78728          79           434             0
#> Initial BIC Initial BIC      0.4351      0.02012          66           215            46


# Select the more regularized model, fallback if NULL
if (!is.null(models$`1SE Beta`)) {
  final_model <- cvfit$est.1se.beta
} else final_model <- fit_initial$est.min

6 Results Interpretation

6.1 Identified meQTLs

# Extract coefficients
Beta <- final_model$Beta
rownames(Beta) <- colnames(sim$X)
colnames(Beta) <- colnames(sim$Z)

# Identify significant associations
threshold <- 1e-3
sig_meqtls <- which(abs(Beta) > threshold, arr.ind = TRUE)

if (nrow(sig_meqtls) > 0) {
  meqtl_df <- data.frame(
    SNP = rownames(Beta)[sig_meqtls[,1]],
    CpG = colnames(Beta)[sig_meqtls[,2]],
    Effect = Beta[sig_meqtls],
    AbsEffect = abs(Beta[sig_meqtls])
  )
  meqtl_df <- meqtl_df[order(meqtl_df$AbsEffect, decreasing = TRUE), ]
  
  cat("Top 15 meQTL associations:\n")
  print(head(meqtl_df, 15), digits = 3)
  
  # Visualization
  top_snps <- unique(meqtl_df$SNP[1:min(30, nrow(meqtl_df))])
  Beta_subset <- Beta[top_snps, , drop = FALSE]
  
  # Create heatmap
  par(mfrow = c(1, 1))
  colors <- colorRampPalette(c("blue", "white", "red"))(100)
  heatmap(as.matrix(Beta_subset), 
          col = colors,
          scale = "none",
          main = "Top meQTL Effects",
          xlab = "CpG Sites", 
          ylab = "SNPs",
          margins = c(8, 8))
}
#> Top 15 meQTL associations:
#>           SNP       CpG Effect AbsEffect
#> 175 rs1000014 cg2000017   2.07      2.07
#> 83  rs1000046 cg2000009  -2.06      2.06
#> 220 rs1000009 cg2000020   1.98      1.98
#> 90  rs1000056 cg2000010  -1.75      1.75
#> 4   rs1000009 cg2000001  -1.72      1.72
#> 89  rs1000022 cg2000010   1.72      1.72
#> 185 rs1000046 cg2000017  -1.52      1.52
#> 40  rs1000063 cg2000003   1.50      1.50
#> 140 rs1000076 cg2000013  -1.50      1.50
#> 41  rs1000009 cg2000004   1.49      1.49
#> 36  rs1000040 cg2000003  -1.42      1.42
#> 168 rs1000056 cg2000015   1.38      1.38
#> 167 rs1000054 cg2000015  -1.31      1.31
#> 200 rs1000009 cg2000018  -1.28      1.28
#> 151 rs1000076 cg2000014  -1.21      1.21

7 CpG Network Analysis

# Extract precision matrix and convert to partial correlations
Theta <- final_model$Theta
rownames(Theta) <- colnames(Theta) <- colnames(sim$Z)

# Compute partial correlations
partial_cor <- -cov2cor(Theta)
diag(partial_cor) <- 0

# Network statistics
edge_threshold <- 0.1
n_edges <- sum(abs(partial_cor[upper.tri(partial_cor)]) > edge_threshold)
{
  cat("\nNetwork Statistics:\n")
  cat("  Total possible edges:", q * (q-1) / 2, "\n")
  cat("  Selected edges (|r| >", edge_threshold, "):", n_edges, "\n")
  cat("  Network density:", sprintf("%.1f%%", 100 * n_edges / (q * (q-1) / 2)), "\n")
}
#> 
#> Network Statistics:
#>   Total possible edges: 190 
#>   Selected edges (|r| > 0.1 ): 15 
#>   Network density: 7.9%

# Identify hub CpGs
degree <- colSums(abs(partial_cor) > edge_threshold)
hub_cpgs <- names(sort(degree, decreasing = TRUE)[1:5])
cat("\nHub CpG sites (highest connectivity):\n")
#> 
#> Hub CpG sites (highest connectivity):
for (cpg in hub_cpgs) {
  cat("  ", cpg, ": degree =", degree[cpg], "\n")
}
#>    cg2000008 : degree = 3 
#>    cg2000011 : degree = 3 
#>    cg2000012 : degree = 3 
#>    cg2000013 : degree = 3 
#>    cg2000014 : degree = 3

# Visualize network
if (requireNamespace("igraph", quietly = TRUE)) {
  library(igraph)
  
  # Create network from significant edges
  edges <- which(abs(partial_cor) > 0.15 & upper.tri(partial_cor), arr.ind = TRUE)
  if (nrow(edges) > 0) {
    edge_list <- data.frame(
      from = rownames(partial_cor)[edges[,1]],
      to = colnames(partial_cor)[edges[,2]],
      weight = abs(partial_cor[edges])
    )
    
    g <- graph_from_data_frame(edge_list, directed = FALSE)
    
    # Node properties
    V(g)$size <- 5 + sqrt(degree[V(g)$name]) * 3
    V(g)$color <- ifelse(V(g)$name %in% hub_cpgs, "red", "lightblue")
    
    # Plot
    par(mfrow = c(1, 1))
    plot(g, 
         layout = layout_with_fr(g),
         vertex.label.cex = 0.7,
         edge.width = E(g)$weight * 3,
         main = "CpG Conditional Dependency Network")
    legend("topright", legend = c("Hub CpG", "Regular CpG"),
           pch = 21, pt.bg = c("red", "lightblue"), pt.cex = 2)
  }
}

8 SNP-CpG-Network Integration

# Analyze how meQTLs relate to network structure
active_snps <- which(rowSums(abs(Beta)) > threshold)

if (length(active_snps) > 0) {
  cat("\nIntegration Analysis:\n")
  cat("=====================\n")
  # For each active SNP, check which CpGs it affects
  for (i in active_snps[1:min(5, length(active_snps))]) {
    affected_cpgs <- which(abs(Beta[i,]) > threshold)
    if (length(affected_cpgs) > 1) {
      # Check if affected CpGs are connected in the network
      subnet_partial <- partial_cor[affected_cpgs, affected_cpgs]
      mean_connection <- mean(abs(subnet_partial[upper.tri(subnet_partial)]))
      
      cat("\n", rownames(Beta)[i], "affects", length(affected_cpgs), "CpGs\n")
      cat("Mean network connection among affected CpGs:", 
          round(mean_connection, 3), "\n")
      cat("Affected CpGs:", paste(colnames(Beta)[affected_cpgs], collapse = ", "), "\n")
    }
  }
}
#> 
#> Integration Analysis:
#> =====================
#> 
#>  rs1000001 affects 9 CpGs
#> Mean network connection among affected CpGs: 0.029 
#> Affected CpGs: cg2000001, cg2000003, cg2000007, cg2000010, cg2000011, cg2000012, cg2000013, cg2000014, cg2000017 
#> 
#>  rs1000002 affects 5 CpGs
#> Mean network connection among affected CpGs: 0 
#> Affected CpGs: cg2000001, cg2000003, cg2000007, cg2000012, cg2000018 
#> 
#>  rs1000003 affects 3 CpGs
#> Mean network connection among affected CpGs: 0 
#> Affected CpGs: cg2000003, cg2000017, cg2000020 
#> 
#>  rs1000005 affects 2 CpGs
#> Mean network connection among affected CpGs: 0 
#> Affected CpGs: cg2000013, cg2000020

9 Model Validation

9.1 Prediction Performance

# Split data for validation
n_train <- round(0.75 * n)
train_idx <- sample(n, n_train)
test_idx <- setdiff(1:n, train_idx)

# Refit on training data
cvfit_train <- cv.missoNet(
  X = sim$X[train_idx, ],
  Y = sim$Z[train_idx, ],
  kfold = 5,
  lambda.beta = lambda.beta.refined,
  lambda.theta = lambda.theta.refined,
  verbose = 0
)

# Predictions
Y_pred <- predict(cvfit_train, newx = sim$X[test_idx, ])
Y_test <- sim$Y[test_idx, ]  # True complete values

# Calculate performance metrics
mse_per_cpg <- colMeans((Y_pred - Y_test)^2)
cor_per_cpg <- sapply(1:q, function(j) cor(Y_pred[,j], Y_test[,j]))

# Visualization
par(mfrow = c(1, 2))

# MSE vs missing rate
plot(miss_by_cpg, mse_per_cpg, pch = 19, col = "darkblue",
     xlab = "Missing Rate", ylab = "Prediction MSE",
     main = "Prediction Error vs Missing Rate")
lines(lowess(miss_by_cpg, mse_per_cpg), col = "red", lwd = 2)

# Correlation vs missing rate
plot(miss_by_cpg, cor_per_cpg, pch = 19, col = "darkgreen",
     xlab = "Missing Rate", ylab = "Prediction Correlation",
     main = "Prediction Accuracy vs Missing Rate")
lines(lowess(miss_by_cpg, cor_per_cpg), col = "red", lwd = 2)


{
  cat("\nPrediction Performance:\n")
  cat("  Mean MSE:", round(mean(mse_per_cpg), 4), "\n")
  cat("  Mean correlation:", round(mean(cor_per_cpg), 3), "\n")
  cat("  Worst CpG MSE:", round(max(mse_per_cpg), 4), "\n")
  cat("  Best CpG correlation:", round(max(cor_per_cpg), 3), "\n")
}
#> 
#> Prediction Performance:
#>   Mean MSE: 0.9618 
#>   Mean correlation: 0.842 
#>   Worst CpG MSE: 1.4674 
#>   Best CpG correlation: 0.965

9.2 Stability Assessment

# Bootstrap stability (simplified for demonstration)
n_boot <- 10
selection_freq_beta <- matrix(0, p, q)
selection_freq_theta <- matrix(0, q, q)

for (b in 1:n_boot) {
  # Bootstrap sample
  boot_idx <- sample(n, replace = TRUE)
  
  # Fit model
  fit_boot <- missoNet(
    X = sim$X[boot_idx, ],
    Y = sim$Z[boot_idx, ],
    lambda.beta = final_model$lambda.beta,
    lambda.theta = final_model$lambda.theta,
    verbose = 0
  )
  
  # Track selections
  selection_freq_beta <- selection_freq_beta + (abs(fit_boot$est.min$Beta) > threshold)
  selection_freq_theta <- selection_freq_theta + 
    (abs(fit_boot$est.min$Theta) > edge_threshold)
}

# Normalize
selection_freq_beta <- selection_freq_beta / n_boot
selection_freq_theta <- selection_freq_theta / n_boot

# Identify stable features
stable_meqtls <- which(selection_freq_beta > 0.8, arr.ind = TRUE)
stable_edges <- which(selection_freq_theta > 0.8 & upper.tri(selection_freq_theta), 
                     arr.ind = TRUE)

{
  cat("\nStability Results:\n")
  cat("  Stable meQTLs (>80% selection):", nrow(stable_meqtls), "\n")
  cat("  Stable network edges (>80% selection):", nrow(stable_edges), "\n")
}
#> 
#> Stability Results:
#>   Stable meQTLs (>80% selection): 94 
#>   Stable network edges (>80% selection): 18

if (nrow(stable_meqtls) > 0) {
  cat("\nMost stable meQTL associations:\n")
  stable_df <- data.frame(
    SNP = colnames(sim$X)[stable_meqtls[,1]],
    CpG = colnames(sim$Z)[stable_meqtls[,2]],
    Frequency = selection_freq_beta[stable_meqtls]
  )
  print(head(stable_df[order(stable_df$Frequency, decreasing = TRUE), ], 10))
}
#> 
#> Most stable meQTL associations:
#>          SNP       CpG Frequency
#> 1  rs1000001 cg2000001         1
#> 2  rs1000009 cg2000001         1
#> 3  rs1000039 cg2000001         1
#> 4  rs1000040 cg2000001         1
#> 5  rs1000046 cg2000001         1
#> 7  rs1000014 cg2000002         1
#> 8  rs1000048 cg2000002         1
#> 9  rs1000001 cg2000003         1
#> 10 rs1000009 cg2000003         1
#> 12 rs1000040 cg2000003         1

10 Biological Interpretation

10.1 Gene Set Enrichment Analysis (Simulated)

# Annotate SNPs with genes (simulated)
active_snp_ids <- which(rowSums(abs(Beta)) > threshold)
if (length(active_snp_ids) > 0) {
  gene_names <- paste0("GENE", sample(1:50, length(active_snp_ids), replace = TRUE))
  
  snp_annotation <- data.frame(
    SNP = colnames(sim$X)[active_snp_ids],
    Gene = gene_names,
    Effect_Size = rowSums(abs(Beta[active_snp_ids, ]))
  )
  
  cat("Genes with meQTLs:\n")
  gene_summary <- aggregate(Effect_Size ~ Gene, snp_annotation, sum)
  gene_summary <- gene_summary[order(gene_summary$Effect_Size, decreasing = TRUE), ]
  print(head(gene_summary, 10))
}
#> Genes with meQTLs:
#>      Gene Effect_Size
#> 15 GENE27  12.4141659
#> 34 GENE50  10.9206360
#> 25 GENE41   7.9399606
#> 1   GENE1   6.7872450
#> 35  GENE6   6.1280365
#> 12 GENE24   5.4923039
#> 11 GENE21   4.9277246
#> 7  GENE16   3.6803628
#> 38  GENE9   3.2609485
#> 21 GENE33   0.2784541

# CpG annotation (simulated)
cpg_annotation <- data.frame(
  CpG = colnames(sim$Z),
  Region = sample(c("Promoter", "Gene Body", "Enhancer", "Intergenic"), 
                  q, replace = TRUE, prob = c(0.4, 0.3, 0.2, 0.1)),
  Island = sample(c("Island", "Shore", "Shelf", "Open Sea"),
                  q, replace = TRUE, prob = c(0.3, 0.2, 0.2, 0.3))
)

{
  cat("\nCpG distribution by genomic region:\n")
  print(table(cpg_annotation$Region))
}
#> 
#> CpG distribution by genomic region:
#> 
#>   Enhancer  Gene Body Intergenic   Promoter 
#>          3          6          2          9

{
  cat("\nCpG distribution by island status:\n")
  print(table(cpg_annotation$Island))
}
#> 
#> CpG distribution by island status:
#> 
#>   Island Open Sea    Shelf    Shore 
#>        9        6        3        2

11 Reporting and Export

11.1 Summary Report

#> 
#> ========================================
#>        ANALYSIS SUMMARY REPORT
#> ========================================
#> 
#> DATA CHARACTERISTICS:
#> --------------------
#> • Samples analyzed: 600
#> • SNPs tested: 80
#> • CpG sites measured: 20
#> • Missing data rate: 9.4%
#> • Missing pattern: MAR (technical dropout)
#> 
#> MODEL SELECTION:
#> ---------------
#> • Method: 5-fold cross-validation
#> • Selection criterion: 1-SE.Beta CV error
#> • Lambda (Beta): 0.4023
#> • Lambda (Theta): 0.0307
#> 
#> KEY FINDINGS:
#> ------------
#> • meQTLs identified: 69 / 80 SNPs
#> • SNP-CpG associations: 239
#> • CpG network edges: 15 / 190 possible
#> • Hub CpGs identified: 5
#> 
#> MODEL PERFORMANCE:
#> -----------------
#> • Mean prediction correlation: 0.842
#> • Mean prediction MSE: 0.9618
#> • Stability (bootstrap): 39% of associations stable
#> 
#> BIOLOGICAL INSIGHTS (SIMULATED):
#> -------------------
#> • Primary affected regions: Promoters and gene bodies
#> • Network structure suggests co-regulated CpG modules
#> • Hub CpGs may represent key regulatory sites

12 Conclusions

This case study demonstrated a complete genomic data analysis workflow using missoNet:

Data Preparation: Handled realistic missing patterns and correlation structures
Model Selection: Used two-stage approach with cross-validation
meQTL Discovery: Identified genetic variants affecting methylation
Network Analysis: Revealed co-regulation patterns among CpG sites
Validation: Assessed prediction accuracy and selection stability
Integration: Connected genetic effects with epigenetic networks

The missoNet framework successfully:

Handled 10-30% missing data without imputation
Identified sparse genetic effects in high-dimensional setting
Revealed network structure among responses
Provided stable, interpretable results

Case Study: Genomic Data Analysis with missoNet

Yixiao Zeng, Celia M. T. Greenwood

2025-09-02