Getting started

Thijs Janzen

2024-07-17

Using treestats

The treestats package provides an easy to use interface to calculate summary statistics on phylogenetic trees. To obtain a list of all supported summary statistics use:

list_statistics()
##  [1] "area_per_pair"          "average_leaf_depth"     "avg_ladder"            
##  [4] "avg_vert_depth"         "b1"                     "b2"                    
##  [7] "beta"                   "blum"                   "cherries"              
## [10] "colless"                "colless_corr"           "colless_quad"          
## [13] "crown_age"              "diameter"               "double_cherries"       
## [16] "eigen_centrality"       "eigen_centralityW"      "ew_colless"            
## [19] "four_prong"             "gamma"                  "i_stat"                
## [22] "il_number"              "imbalance_steps"        "j_one"                 
## [25] "j_stat"                 "laplace_spectrum_a"     "laplace_spectrum_e"    
## [28] "laplace_spectrum_g"     "laplace_spectrum_p"     "max_adj"               
## [31] "max_betweenness"        "max_closeness"          "max_closenessW"        
## [34] "max_del_width"          "max_depth"              "max_ladder"            
## [37] "max_laplace"            "max_width"              "mean_branch_length"    
## [40] "mean_branch_length_ext" "mean_branch_length_int" "min_adj"               
## [43] "min_laplace"            "mntd"                   "mpd"                   
## [46] "mw_over_md"             "nltt_base"              "number_of_lineages"    
## [49] "phylogenetic_div"       "pigot_rho"              "pitchforks"            
## [52] "psv"                    "rogers"                 "root_imbalance"        
## [55] "rquartet"               "sackin"                 "stairs"                
## [58] "stairs2"                "symmetry_nodes"         "tot_coph"              
## [61] "tot_internal_path"      "tot_path"               "tree_height"           
## [64] "treeness"               "var_branch_length"      "var_branch_length_ext" 
## [67] "var_branch_length_int"  "var_depth"              "vpd"                   
## [70] "wiener"

If your favourite summary statistic is missing, please let the maintainer know, treestats is a dynamic package always under development, and the maintainers are always looking for new statistics!

Given a phylogenetic tree, you can now use of the available functions to calculate your summary statistic of choice. Let’s take for instance the Colless statistic (and we generate a dummy tree):

phy <- ape::rphylo(n = 100, birth = 1, death = 0.1)

treestats::colless(phy)
## [1] 312

Looking at the documentation of the colless statistic (?colless), we find that the function also includes options to normalize for size: either ‘pda’ or ‘yule’:

treestats::colless(phy, normalization = "yule")
## [1] -0.3692387

Multiple statistics

The treestats package supports calculating many statistics in one go. For this, several functions have been set up aptly. Firstly, the function calc_all_stats will calculate all statistics:

all_stats <- calc_all_stats(phy)

Similarly, we can also blanket apply all topology associated summary statistics:

balance_stats <- calc_topology_stats(phy)
unlist(balance_stats)
##      area_per_pair average_leaf_depth         avg_ladder     avg_vert_depth 
##       1.339475e+01       8.180000e+00       2.000000e+00       7.226131e+00 
##                 b1                 b2               beta               blum 
##       5.520837e+01       5.298096e+00       5.710937e-02       1.126063e+02 
##           cherries            colless       colless_corr       colless_quad 
##       3.600000e+01       3.120000e+02       6.431664e-02       5.596000e+03 
##           diameter    double_cherries   eigen_centrality         ew_colless 
##       2.300000e+01       5.000000e+00       2.796905e-01       4.195618e-01 
##         four_prong             i_stat          il_number    imbalance_steps 
##       3.000000e+00       4.753022e-01       2.800000e+01       8.600000e+01 
##              j_one    max_betweenness      max_closeness      max_del_width 
##       8.122074e-01       1.207100e+04                 NA       1.200000e+01 
##          max_depth         max_ladder          max_width         mw_over_md 
##       1.300000e+01       2.000000e+00       3.000000e+01       2.307692e+00 
##         pitchforks             rogers     root_imbalance           rquartet 
##       1.400000e+01       5.500000e+01       5.500000e-01       4.828989e+06 
##             sackin             stairs            stairs2     symmetry_nodes 
##       8.180000e+02       5.555556e-01       6.718059e-01       5.600000e+01 
##           tot_coph  tot_internal_path    tot_path_length          var_depth 
##       7.339000e+03       6.200000e+02       1.438000e+03       4.654141e+00