knitr::opts_chunk$set(echo = TRUE, tidy = TRUE)
library(dplyr)
library(purrr)
library(tibble)
library(ggplot2)
library(popbio) # for projection.matrix()
library(raretrans)
# Raymond's theme modifications
rlt_theme <- theme(axis.title.y = element_text(colour="grey20",size=15,face="bold"),
axis.text.x = element_text(colour="grey20",size=15, face="bold"),
axis.text.y = element_text(colour="grey20",size=15,face="bold"),
axis.title.x = element_text(colour="grey20",size=15,face="bold"))The goal of this vignette is to demonstrate the use of package
raretrans on a single population and transition period.
raretrans assumes the projection matrix is a list of two
matrices, a transition matrix and a fertility matrix. This is the output
format of popbio::projection.matrix. If we have individual
transitions in a dataframe we can use
popbio::projection.matrix to obtain the required data. We
demonstrate with the transition and fertility data for the epiphytic
orchid Lepanthes elto POPNUM 250 in
period 5.
| 209 | 1 | 1 | 6 | 0 | 67 | j | j | 1 | 4 | TRUE | 4 | |
| 209 | 1 | 1 | 6 | 0 | 68 | a | a | 1 | 11 | TRUE | 11 | |
| 209 | 1 | 1 | 6 | 0 | 69 | a | a | 1 | 12 | TRUE | 12 | |
| 209 | 1 | 1 | 6 | 0 | 70 | a | a | 1 | 12 | TRUE | 12 | |
| 209 | 1 | 1 | 6 | 0 | 71 | j | a | 1 | 12 | TRUE | 12 | |
| 209 | 1 | 1 | 6 | 0 | 72 | a | a | 1 | 5 | TRUE | 5 |
Each row of this dataframe has columns for current stage, the next stage, and average per individual fertility. Note that “p” stands from “plantula” which means “seedlings” in Spanish. The first set of lines below change the name of the life history stage from “p” to “s” after selecting the population and time period.
onepop <- L_elto %>%
# Filter out population # 250, period (year) 5
filter(POPNUM == 250, year == 5) %>%
# redefine p for el plantón to s for seedling
mutate(stage = case_when(stage == "p" ~ "s", TRUE ~ stage), next_stage = case_when(next_stage ==
"p" ~ "s", TRUE ~ next_stage))
# popbio::projection.matrix doesn't like tibbles set TF = TRUE to get the right
# format.
TF <- popbio::projection.matrix(as.data.frame(onepop), stage = stage, fate = next_stage,
fertility = "fertility", sort = c("s", "j", "a"), TF = TRUE)
TF # This is the stage-structured population model## $T
##
## s j a
## s 0.0909 0.0000 0.0000
## j 0.6364 0.5745 0.0000
## a 0.0000 0.2979 0.8529
##
## $F
##
## s j a
## s 0.000 0.000 0.118
## j 0.000 0.000 0.000
## a 0.000 0.000 0.000
Our stages are now coded as s (seedling),
j (juvenile), and a (adult), and we
now have two matrices: T (stage transition) and
F (fecundity). The observed asymptotic population
growth rate is \(\lambda =\) 0.93. The
rare transitions missing from our first transition matrix,
TF$T, are the transition from seedling (s) to
adult (a), and the transition from j to s.
But we know that these happen.
Since our priors are based on counts (number of individuals, N) and the prior equivalent sampling size is expressed as a multiple of the number of individuals observed, we need to obtain the number of individuals in each stage (\(N\)) at the first time period.
We use function raretrans::get_state_vector() to obtain
the starting individual count, N.
N <- get_state_vector(onepop, stage = stage, sort = c("s", "j", "a"))
N # A vector of # of starting individuals for each stage## [1] 11 47 34
The list of matrices and vector of individual counts do not have to come from a dataframe as we’ve done here. As long as they have the expected format they can be created by hand. We use population 231 in period 2 as an example, splitting the matrix into transition T and fecundity F matrices. Below, “m” stands for “muerte” that is plants that are dead.
## $T
## stage
## fate p j a
## p 0.5000 0.0000 0.0000
## j 0.0000 0.8333 0.0000
## a 0.0000 0.0625 0.8750
##
## $F
## [,1] [,2] [,3]
## [1,] 0 0 0.125
## [2,] 0 0 0.000
## [3,] 0 0 0.000
## p j a
## 2 6 16
This matrix is missing the transition from seedling to juvenile, and none of the 6 juveniles died leading to an overestimate of survival. The observed asymptotic population growth rate is \(\lambda =\) 0.88. The matrix is not Ergodic (you can’t get to every other state from one or more states), and reducible, meaning one or more columns and rows can be discarded and have the same eigen properties.
Tremblay et al (in prep) show that a dirichlet prior works on the columns of the transition matrix (T), and a gamma prior works for entries in the fecundity matrix F.
So, let’s add a uniform dirichlet prior with weight = \(1\) to the transition matrix, \(T\). Here, we have 4 fates (3 + death), so each fate adds 0.25 to the matrix of observed fates (not the transition matrix!). When we specify a prior matrix for the transitions, there is one more row than columns. This extra row represents death.
Tprior <- matrix(0.25, byrow = TRUE, ncol = 3, nrow = 4)
fill_transitions(TF, N, P = Tprior) # returns transition matrix by default## [,1] [,2] [,3]
## [1,] 0.1042 0.00521 0.00714
## [2,] 0.6042 0.56771 0.00714
## [3,] 0.0208 0.29688 0.83571
We can get the same result ‘by hand’ - we need the vector of observations because the posterior is calculated from the observed transitions, not the transition matrix.
Tobs <- sweep(TF$T, 2, N, "*") # get the observed transitions
Tobs <- rbind(Tobs, N - colSums(Tobs)) # add the mortality row
Tobs <- Tobs + 0.25 # add the prior
sweep(Tobs, 2, colSums(Tobs), "/")[-4, ] # divide by the column sum, and discard mortalityrow## s j a
## s 0.1042 0.00521 0.00714
## j 0.6042 0.56771 0.00714
## a 0.0208 0.29688 0.83571
The uniform prior fills in the missing transitions, but also creates problems because it provides transition values that are biologically impossible. For example, it provides a transition for adult->seedling, when this transition is only possible in the fecundity matrix \(F\). For this reason we do not recommend using uniform priors.
We must specify the parameters for the fertility prior as a matrix.
Entries representing stages that are non-reproducing or outcome stages
that don’t occur by reproduction should be NA.
alpha <- matrix(c(NA_real_, NA_real_, 1e-05, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), nrow = 3, ncol = 3, byrow = TRUE)
beta <- matrix(c(NA_real_, NA_real_, 1e-05, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), nrow = 3, ncol = 3, byrow = TRUE)
fill_fertility(TF, N, alpha = alpha, beta = beta)##
## s j a
## s 0.000 0.000 0.118
## j 0.000 0.000 0.000
## a 0.000 0.000 0.000
The change in the fertility is < 0.0001 compared to the observed value. Calculating by hand, prior alpha is the number of observed offspring, and prior beta is the number of observed adults.
obs_offspring <- N[3] * TF$F[1, 3]
prior_alpha <- 1e-05
prior_beta <- 1e-05
posterior_alpha <- obs_offspring + prior_alpha
posterior_beta <- N[3] + prior_beta
posterior_alpha/posterior_beta # expected value## [1] 0.118
This demonstrates why the posterior point estimate of fecundity does not change much; the non-informative values for \(\alpha\) and \(\beta\) barely change the observed values.
Now we can put them together.
unif <- list(T = fill_transitions(TF, N), F = fill_fertility(TF, N, alpha = alpha,
beta = beta))
unif## $T
## [,1] [,2] [,3]
## [1,] 0.1042 0.00521 0.00714
## [2,] 0.6042 0.56771 0.00714
## [3,] 0.0208 0.29688 0.83571
##
## $F
##
## s j a
## s 0.000 0.000 0.118
## j 0.000 0.000 0.000
## a 0.000 0.000 0.000
The asymptotic population growth rate is now \(\lambda =\) 0.92. The growth rate shrinks slightly because applying the uniform prior to the transition probabilities causes the observed growth and survival transitions to shrink slightly relative to the unobserved transitions.
returnTypeBy default fill_transitions() returns the transition
matrix \(T\), and
fill_fertility() returns the fertility matrix \(F\). There are three other values that the
argument returnType can take:
fill_transitions(... returnType = "TN") can return an
augmented matrix of fates, which is useful for simulation. The fourth
row of this result (see below) is the mortality state.## [,1] [,2] [,3]
## [1,] 1.25 0.25 0.25
## [2,] 7.25 27.25 0.25
## [3,] 0.25 14.25 29.25
## [4,] 3.25 6.25 5.25
fill_fertility(... returnType = "ab") returns the alpha
and beta vectors of the posterior.## $alpha
##
## s j a
## s 4
## j
## a
##
## $beta
## [,1] [,2] [,3]
## [1,] NA NA 34
## [2,] NA NA NA
## [3,] NA NA NA
##
## s j a
## s 0.10417 0.00521 0.12479
## j 0.60417 0.56771 0.00714
## a 0.02083 0.29688 0.83571
To fix the problem of creating impossible transitions, we specify a more informative prior elicited from an expert on epiphytic orchids (RLT). It still has to have the same shape, one more row than columns.
RLT_Tprior <- matrix(c(0.25, 0.025, 0, 0.05, 0.9, 0.025, 0.01, 0.025, 0.95, 0.69,
0.05, 0.025), byrow = TRUE, nrow = 4, ncol = 3)Note that the 1st row, 3rd column is 0.0, because this transition is impossible. This prior is constructed so the columns sum to 1, which creates the greatest flexibility for weighting the prior. By default the weight is 1, interpreted as a prior sample size of 1.
## [,1] [,2] [,3]
## [1,] 0.104167 0.000521 0.000000
## [2,] 0.587500 0.581250 0.000714
## [3,] 0.000833 0.292187 0.855714
We can specify the weight as a multiple of the sample size for each stage.
## [,1] [,2] [,3]
## [1,] 0.14394 0.00833 0.00000
## [2,] 0.44091 0.68298 0.00833
## [3,] 0.00333 0.20691 0.88529
Here the prior is weighted half as much as the observed number of transitions. In this case, with only 2 transitions the prior effective sample size is still 1. If the number of observed transitions was larger, a prior weight of 0.5N would be larger than 1, but still allow the data to dominate.
The marginal posterior distribution of one element in a multinomial is a beta distribution, and we use this to get credible intervals on our transition rates. We can use the TN return type to get the parameters of the desired multinomial.
TN <- fill_transitions(TF, N, P = RLT_Tprior, priorweight = 0.5, returnType = "TN")
a <- TN[, 1] # change 1 to 2, 3 etc to get marginal distribution of other columns
b <- sum(TN[, 1]) - TN[, 1] # change 1 to 2, 3 etc to get marginal distribution of other columns
p <- a/(a + b)
lcl <- qbeta(0.025, a, b)
ucl <- qbeta(0.975, a, b)
knitr::kable(sprintf("%.3f (%.3f, %.3f)", p, lcl, ucl))| x |
|---|
| 0.144 (0.025, 0.343) |
| 0.441 (0.218, 0.677) |
| 0.003 (0.000, 0.038) |
| 0.412 (0.195, 0.649) |
Those are the point estimates (compare with first column above),
lower and upper \(95\%\) symmetric
credible intervals for transitions out of the seedling stage. There is a
high degree of uncertainty because of the small sample size (\(2\)), and low weight on the prior (\(1\)), leading to an effective sample size
of 3. If we increase the effective sample size to \(20\) by specifying:
priorweight\(= 9 (9*2 = 18 + 2 =
20)\) the symmetric credible intervals shrink quite a bit:
The transition rate from seedling to juvenile shrinks when the prior sample size is too large. In general the prior sample size should be less than the observed sample size.
Interactive Exploration: To visually test the effects of differing prior weights and explore posterior probability densities without writing code, you can launch the interactive Shiny app using
raretrans::run_app().
Obtaining credible intervals on the asymptotic growth rate, \(\lambda\), requires simulating matrices
from the posterior distributions. This is somewhat complicated to do
correctly, and we have written a function
raretrans::sim_transitions() to generate a list of
simulated matrices given the observed matrix and prior
specifications.
## [[1]]
## [,1] [,2] [,3]
## [1,] 0.2325 0.00125 0.622844
## [2,] 0.4258 0.67870 0.000621
## [3,] 0.0363 0.19846 0.955378
Now we simulate 1000 times, calculate the leading eigenvalue of each matrix, and draw a histogram.
set.seed(8390278) # make this part reproducible
alpha2 <- matrix(c(NA_real_, NA_real_, 0.025, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), nrow = 3, ncol = 3, byrow = TRUE)
beta2 <- matrix(c(NA_real_, NA_real_, 1, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), nrow = 3, ncol = 3, byrow = TRUE)
# generate 1000 matrices based on the prior transitions and fertilities plus
# the data
RLT_0.5 <- sim_transitions(TF, N, P = RLT_Tprior, alpha = alpha2, beta = beta2, priorweight = 0.5,
samples = 1000)
# extract the lambdas for each matrix
RLT_0.5 <- tibble(lposterior = map_dbl(RLT_0.5, lambda))
ggplot(data = RLT_0.5, mapping = aes(x = lposterior)) + geom_histogram(binwidth = 0.02) +
rlt_themeWe can calculate some summary statistics too. pincrease
is the probability that \(\lambda >
1\).
RLT_0.5_summary <- summarize(RLT_0.5, medianL = median(lposterior), meanL = mean(lposterior),
lcl = quantile(lposterior, probs = 0.025), ucl = quantile(lposterior, probs = 0.975),
pincrease = sum(lposterior > 1)/n())
knitr::kable(RLT_0.5_summary, digits = 2)| medianL | meanL | lcl | ucl | pincrease |
|---|---|---|---|---|
| 0.96 | 0.95 | 0.87 | 1.03 | 0.13 |
cri <- transition_CrI(TF, N, stage_names = c("plantula", "juvenile", "adult"))
plot_transition_CrI(cri)