CRAN status CRAN downloads Ask DeepWiki

sshist

The sshist package implements the Shimazaki-Shinomoto method for finding the optimal number of bins in histograms.

Unlike the standard Freedman-Diaconis rule (used by default in ggplot2), this method minimizes the expected L2 loss function between the histogram and the unknown underlying density function. It is particularly effective for:

Installation

# stable version from CRAN
install.packages("sshist")

You can install the development version of sshist like so:

# install.packages("devtools")
devtools::install_github("celebithil/sshist")

Example 1: Basic 1D Usage

Here is a basic example using the Old Faithful Geyser data.

library(sshist)

# Load data
data(faithful)
x_data <- faithful$waiting

# Calculate optimal binning
res <- sshist(x_data)

# Print summary
print(res)
#> Shimazaki-Shinomoto Histogram Optimization
#> ------------------------------------------
#> Optimal Bins (N): 37 
#> Bin Width (D):    1.432 
#> Cost Minimum:     -9.681

hist(res$data, breaks=res$edges, freq=FALSE,
       main=paste("Optimal Hist (N=", res$opt_n, ")"),
       col="lightblue", border="white", xlab="Data")

Example 2: Integration with ggplot2

sshist calculates the optimal parameters, which you can easily pass to ggplot2.

library(ggplot2)

# Create a data frame
df <- data.frame(waiting = x_data)

ggplot(df, aes(x = waiting)) +
  geom_histogram(breaks = res$edges, fill = "#69b3a2", color = "white", alpha = 0.8) +
  geom_rug(alpha = 0.1) +
  ggtitle(paste0("Shimazaki-Shinomoto Optimization (N = ", res$opt_n, ")")) +
  theme_minimal()

Example 3: 2D Histogram Optimization

For bivariate data, sshist_2d finds the optimal binning for both X and Y axes simultaneously.

# Get bimodal 2D data
y_data <- faithful$eruptions

# Optimize
res2d <- sshist_2d(x_data, y_data)

# Print summary
print(res2d)
#> Shimazaki-Shinomoto 2D Histogram Optimization
#> ---------------------------------------------
#> Optimal Bins X:   9 
#> Optimal Bins Y:   20 
#> Bin Width X:      5.889 
#> Bin Width Y:      0.175 
#> Cost Minimum:     -5.717

Example 4: 2D Optimization with ggplot2

You can easily use the optimized bin counts from sshist_2d in ggplot2 by passing them to the bins argument in geom_bin2d.

# We use the 'res2d' object calculated in Example 3
# containing optimal bins for Old Faithful data

res2d <- sshist_2d(faithful$waiting, faithful$eruptions )

ggplot(faithful, aes(waiting, eruptions)) +
  geom_bin2d(bins = c(res2d$opt_nx, res2d$opt_ny)) +
  scale_fill_distiller(palette = "Spectral") +
  labs(
    title = "Optimal 2D Binning (Old Faithful)",
    subtitle = paste0("Shimazaki-Shinomoto Method: ", 
                      res2d$opt_nx, " x ", res2d$opt_ny, " bins"),
    x = "Waiting Time (min)",
    y = "Eruption Duration (min)"
  ) +
  theme(axis.text = element_text(size = 12),
        title = element_text(size = 12,face="bold"),
        panel.border = element_rect(linewidth = 2, color = "black", fill = NA))

References