| Title: | Offline Taxonomic Name Matching Against Local Darwin Core Snapshots |
| Version: | 0.2.12 |
| Description: | Match taxonomic names against locally stored Darwin Core backbone databases ('WFO', 'COL', 'GBIF', 'ITIS', 'NCBI Taxonomy', 'Open Tree of Life', 'WoRMS', 'Species Fungorum', 'AlgaeBase'). Provides offline fuzzy and exact matching with synonym resolution, hybrid name detection, and a unified output schema across all sources. All heavy computation runs in the 'vectra' C11 columnar engine. |
| License: | MIT + file LICENSE |
| URL: | https://github.com/gcol33/taxify |
| BugReports: | https://github.com/gcol33/taxify/issues |
| Additional_repositories: | https://gcol33.r-universe.dev |
| Depends: | R (≥ 4.1.0) |
| Imports: | curl, jsonlite, rlang, vectra |
| Suggests: | DBI, knitr, openxlsx2, rmarkdown, RSQLite, taxifydb, testthat (≥ 3.0.0), TR8 |
| VignetteBuilder: | knitr |
| Config/testthat/edition: | 3 |
| Encoding: | UTF-8 |
| Config/roxygen2/version: | 8.0.0 |
| NeedsCompilation: | no |
| Packaged: | 2026-06-24 12:03:56 UTC; Gilles Colling |
| Author: | Gilles Colling |
| Maintainer: | Gilles Colling <gilles.colling051@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-30 11:30:07 UTC |
taxify: Offline Taxonomic Name Matching Against Local Darwin Core Snapshots
Description
Match taxonomic names against locally stored Darwin Core backbone databases ('WFO', 'COL', 'GBIF', 'ITIS', 'NCBI Taxonomy', 'Open Tree of Life', 'WoRMS', 'Species Fungorum', 'AlgaeBase'). Provides offline fuzzy and exact matching with synonym resolution, hybrid name detection, and a unified output schema across all sources. All heavy computation runs in the 'vectra' C11 columnar engine.
Author(s)
Maintainer: Gilles Colling gilles.colling051@gmail.com (ORCID) [copyright holder]
Authors:
Gilles Colling gilles.colling051@gmail.com (ORCID) [copyright holder]
See Also
Useful links:
Add macroalgal functional traits (AlgaeTraits)
Description
Joins AlgaeTraits (Vranken et al. 2023) macroalgal functional traits to a
taxify() result by looking up accepted_name. AlgaeTraits provides
morphological, ecological, and life-history traits for European seaweeds.
Usage
add_algae_traits(x, verbose = TRUE)
Arguments
x |
A data.frame returned by |
verbose |
Logical. Default |
Details
Source: AlgaeTraits (Vranken et al. 2023, VLIZ Marine Data Archive, CC BY 4.0). Coverage: ~1,745 European macroalgae species.
Value
The same data.frame with additional columns:
- algae_body_size_cm
Maximum body size in centimetres.
- algae_growth_form
Growth form / body shape (e.g., filamentous, foliose, crustose).
- algae_calcification
Calcification type (e.g., uncalcified, articulated, encrusting).
- algae_life_span
Life span category (annual, perennial, etc.).
- algae_tidal_zone
Tidal zonation (e.g., supralittoral, eulittoral, sublittoral).
- algae_wave_exposure
Wave exposure tolerance (sheltered, moderately exposed, exposed).
- algae_environment
Habitat environment (marine, brackish, freshwater).
- algae_substrate
Environmental position / substrate type.
References
Vranken S et al. (2023) AlgaeTraits: a trait database for (European) seaweeds. Earth System Science Data 15:2711-2754. doi:10.5194/essd-15-2711-2023
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Fucus vesiculosus", backend = "gbif") |>
add_algae_traits()
options(old)
Add alien species first record years
Description
Joins alien species first record data to a taxify() result, filtered
by country. Data from the Global Alien Species First Record Database
(Seebens et al. 2017).
Usage
add_alien_first_records(x, country, verbose = TRUE)
Arguments
x |
A data.frame returned by |
country |
Character. ISO 3166-1 alpha-2 country code(s), or
|
verbose |
Logical. Default |
Details
Source: Global Alien Species First Record Database v3.1 (Seebens et al. 2017, Nature Communications 8, 14435). CC BY 4.0. Coverage: ~77k species x country combinations across all taxa.
Value
The same data.frame with additional column(s):
- alien_first_record
Year of the first record (integer), or
NAif not recorded for that country.- alien_first_record_source
Database that contributed the record (e.g.,
"GAVIA","CABI ISC").- alien_first_record_reference
Original citation or reference for the record.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Robinia pseudoacacia") |>
add_alien_first_records(country = "AT")
taxify(c("Robinia pseudoacacia", "Ailanthus altissima")) |>
add_alien_first_records(country = c("AT", "DE"))
options(old)
Add amphibian life-history traits (AmphiBIO)
Description
Joins AmphiBIO amphibian life-history and ecological traits to a
taxify() result by looking up accepted_name.
Usage
add_amphibio(x, verbose = TRUE)
Arguments
x |
A data.frame returned by |
verbose |
Logical. Default |
Details
Source: AmphiBIO (Oliveira et al. 2017, CC BY 4.0). Coverage: ~6,800 amphibian species. Amphibians only.
Value
The same data.frame with additional columns:
- body_size_mm
Maximum body size in mm (snout-vent length).
- age_maturity_d
Age at maturity in days.
- longevity_d
Maximum longevity in days.
- litter_size
Clutch/litter size.
- reproductive_output
Reproductive output per year.
- offspring_size_mm
Offspring size in mm.
- direct_development
Direct development (0/1).
- larval
Has larval stage (0/1).
- aquatic
Aquatic habitat (0/1).
- fossorial
Fossorial habitat (0/1).
- arboreal
Arboreal habitat (0/1).
- diurnal
Diurnal activity (0/1).
- nocturnal_amphibio
Nocturnal activity (0/1). Named
nocturnal_amphibioto avoid collision with EltonTraits'nocturnalcolumn.
References
Oliveira BF, Sao-Pedro VA, Santos-Barrera G, Penone C, Costa GC (2017) AmphiBIO, a global database for amphibian ecological traits. Scientific Data 4:170123.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Bufo bufo", backend = "gbif") |>
add_amphibio()
options(old)
Add longevity and life-history traits (AnAge)
Description
Joins AnAge (Animal Ageing and Longevity Database) traits to a
taxify() result by looking up accepted_name.
Usage
add_anage(x, verbose = TRUE)
Arguments
x |
A data.frame returned by |
verbose |
Logical. Default |
Details
Source: AnAge (de Magalhaes & Costa 2009, CC BY). Coverage: ~4.7k vertebrate species (mammals, birds, reptiles, amphibians, fish).
Value
The same data.frame with additional columns:
- max_longevity_yr
Maximum longevity in years.
- anage_body_mass_g
Adult body mass in grams.
- metabolic_rate_w
Basal metabolic rate in watts.
- female_maturity_d
Female age at sexual maturity in days.
- male_maturity_d
Male age at sexual maturity in days.
- gestation_incubation_d
Gestation or incubation length in days.
- anage_litter_size
Litter or clutch size.
- birth_mass_g
Mass at birth in grams.
- growth_rate
Growth rate (1/days).
- temperature_k
Body temperature in Kelvin.
References
de Magalhaes JP, Costa J (2009) A database of vertebrate longevity records and their relation to other life-history traits. Journal of Evolutionary Biology 22:1770-1774.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Vulpes vulpes", backend = "gbif") |>
add_anage()
options(old)
Add cross-taxon body mass and metabolic rate (AnimalTraits)
Description
Joins AnimalTraits body mass and metabolic rate data to a taxify()
result by looking up accepted_name.
Usage
add_animaltraits(x, verbose = TRUE)
Arguments
x |
A data.frame returned by |
verbose |
Logical. Default |
Details
Source: AnimalTraits (Hebert et al. 2022, CC0). Coverage: ~2k species across arthropods, vertebrates, molluscs, and annelids. Individual-level observations aggregated to species medians.
Value
The same data.frame with additional columns:
- animaltraits_body_mass_kg
Median body mass in kg.
- animaltraits_metabolic_rate_w
Median metabolic rate in watts.
References
Hebert K et al. (2022) AnimalTraits – a curated animal trait database for body mass, metabolic rate and brain size. Scientific Data 9:265.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Drosophila melanogaster", backend = "gbif") |>
add_animaltraits()
options(old)
Add arthropod life-history traits (NW European Arthropods)
Description
Joins the Northwestern European Arthropod Life Histories dataset to a
taxify() result by looking up accepted_name.
Usage
add_arthropod_traits(x, verbose = TRUE)
Arguments
x |
A data.frame returned by |
verbose |
Logical. Default |
Details
Source: Logghe et al. (2025, CC BY-NC). Coverage: ~4.9k arthropod species from NW Europe across 10 orders (Coleoptera, Hemiptera, Orthoptera, Araneae, Diptera, Hymenoptera, Lepidoptera, etc.).
Value
The same data.frame with additional columns:
- arthropod_body_size_mm
Body size in mm.
- arthropod_dispersal
Dispersal ability (0–1 ratio within order).
- arthropod_voltinism
Mean number of generations per year.
- arthropod_fecundity
Fecundity (number of eggs/offspring).
- arthropod_development_d
Development time in days.
- arthropod_lifespan_d
Adult lifespan in days.
- arthropod_thermal_mean
Mean thermal niche (degrees C).
- arthropod_diurnality
Activity period (diurnal/nocturnal/both).
- arthropod_feeding_guild
Feeding guild of adult.
- arthropod_trophic_range
Trophic range of adult (specialist/generalist).
References
Logghe A et al. (2025) An in-depth dataset of northwestern European arthropod life histories and ecological traits. Biodiversity Data Journal 13:e146785.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Abax parallelepipedus", backend = "gbif") |>
add_arthropod_traits()
options(old)
Add bird morphology and migration (AVONET)
Description
Joins AVONET species-level averages for bird morphology, ecology,
and migration to a taxify() result by looking up accepted_name.
Usage
add_avonet(x, verbose = TRUE)
Arguments
x |
A data.frame returned by |
verbose |
Logical. Default |
Details
Source: AVONET (Tobias et al. 2022, Figshare, CC BY 4.0). Coverage: ~11k bird species. Birds only.
Value
The same data.frame with additional columns:
- beak_length
Beak length in mm (culmen, species mean).
- beak_depth
Beak depth in mm (species mean).
- wing_length
Wing length in mm (species mean).
- tail_length
Tail length in mm (species mean).
- tarsus_length
Tarsus length in mm (species mean).
- avonet_body_mass_g
Body mass in grams (species mean).
- hand_wing_index
Hand-wing index (pointedness, species mean).
- habitat
Primary habitat classification.
- trophic_level
Trophic level classification.
- trophic_niche
Trophic niche classification.
- migration
Migration strategy:
"sedentary","partial", or"full".
References
Tobias JA et al. (2022) AVONET: morphological, ecological and geographical data for all birds. Ecology Letters 25:581-597.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Parus major", backend = "gbif") |>
add_avonet()
options(old)
Add plant traits from Baseflor (Catminat / Julve)
Description
Joins Baseflor (Julve, Programme Catminat) plant traits to a taxify()
result by looking up accepted_name. Baseflor covers the vascular flora of
France and neighbouring regions, providing flowering phenology, pollination
and breeding biology, dispersal mode, and floral and fruit morphology.
Usage
add_baseflor(x, verbose = TRUE)
Arguments
x |
A data.frame returned by |
verbose |
Logical. Default |
Details
Source: Baseflor, Programme Catminat (Julve 1998 ff.). Coverage: ~7,000 vascular plant taxa of France and neighbouring regions. Data are released under ODbL 1.0 / CC BY-SA 2.0.
For ecological indicator values on the light, temperature, moisture,
reaction, and nutrient axes, see add_eive() (European calibration). For
Raunkiaer life form and seed, leaf, and clonality traits of the Northwest
European flora, see add_leda().
Value
The same data.frame with additional columns:
- flower_begin_month
First month of flowering (1-12).
- flower_end_month
Last month of flowering (1-12). A value smaller than
flower_begin_monthdenotes a flowering period that wraps across the new year (e.g. begin 10, end 6).- pollination_vector
Pollination vector(s): insect, wind, water, self, apogamy. Comma-separated when more than one applies.
- dispersal_mode
Diaspore dispersal mode(s): anemochory, barochory, epizoochory, endozoochory, myrmecochory, hydrochory, autochory, dyszoochory. Comma-separated when more than one applies.
- breeding_system
Sexual system: hermaphroditic, monoecious, dioecious, gynodioecious, androdioecious, gynomonoecious, polygamous.
- flower_colour
Flower colour(s): white, yellow, pink, green, blue, brown, black. Comma-separated when more than one applies.
- fruit_type
Fruit type: achene, capsule, caryopsis, drupe, legume, silique, berry, follicle, cone, samara, pyxid.
- woody_growth_form
Woody growth form for woody taxa: tree, small tree, large tree, shrub, bush, subshrub, liana, parasite. NA for non-woody (herbaceous) taxa.
- continentality
Ellenberg-style continentality indicator value (1-9), the axis absent from EIVE.
- salinity
Ellenberg-style salinity indicator value (0-9), the axis absent from EIVE.
References
Julve, Ph. (1998 ff.) baseflor. Index botanique, ecologique et chorologique de la Flore de France. Programme Catminat.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Bellis perennis") |>
add_baseflor()
options(old)
Add COL-specific columns
Description
Joins extra Catalogue of Life columns to a taxify() result by
looking up taxon_id in the COL backbone. Only enriches rows where
backend == "col".
Usage
add_col_info(x)
Arguments
x |
A data.frame returned by |
Value
The same data.frame with additional columns:
- notho
Hybrid type from COL:
"generic","specific","infrageneric", or"infraspecific".- nomenclaturalCode
Nomenclatural code (
"ICN","ICZN", etc.).- nomenclaturalStatus
Nomenclatural status.
- namePublishedIn
Original publication reference.
- kingdom
Kingdom classification.
- phylum
Phylum classification.
- col_class
Class classification (renamed to avoid conflict with R's
classfunction).- order
Order classification.
- infraspecificEpithet
Infraspecific epithet.
- is_extinct
Logical. Whether the species is extinct (from SpeciesProfile, if available).
- is_marine
Logical. Whether the species is marine.
- is_freshwater
Logical. Whether the species is freshwater.
- is_terrestrial
Logical. Whether the species is terrestrial.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Quercus robur", backend = "col") |>
add_col_info()
options(old)
Add common (vernacular) names
Description
Joins vernacular names to a taxify() result by looking up
accepted_name, filtered by language.
Usage
add_common_names(x, lang = "en", verbose = TRUE)
Arguments
x |
A data.frame returned by |
lang |
Character. ISO 639-1 language code (e.g., |
verbose |
Logical. Default |
Details
Common names are merged from three sources:
GBIF backbone vernacular names (CC0) — multi-language via ISO 639-1 codes.
NCBI Taxonomy common names (public domain) — no language tag (
lang = NA).Open Tree of Life common names (CC0) — no language tag (
lang = NA).
When multiple common names exist for a species in the requested language, the first (most commonly used) is returned.
Value
The same data.frame with an additional column:
- common_name
The vernacular name in the requested language, or
NAif none is available.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Quercus robur") |>
add_common_names()
taxify("Quercus robur") |>
add_common_names(lang = "de")
options(old)
Add conservation status
Description
Joins IUCN Red List conservation status to a taxify() result by
looking up accepted_name in the conservation status enrichment.
Usage
add_conservation_status(x, verbose = TRUE)
Arguments
x |
A data.frame returned by |
verbose |
Logical. Show download progress if enrichment data needs
to be fetched. Default |
Details
Conservation status values are compiled from publicly available sources including GBIF and the IUCN Red List API. Coverage is global across all taxonomic groups (~166k species).
Value
The same data.frame with an additional column:
- conservation_status
IUCN category:
"LC"(Least Concern),"NT"(Near Threatened),"VU"(Vulnerable),"EN"(Endangered),"CR"(Critically Endangered),"EW"(Extinct in the Wild),"EX"(Extinct), orNAif not assessed.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Panthera tigris", backend = "gbif") |>
add_conservation_status()
options(old)
Add custom data by taxonomic matching
Description
Joins an external data source (CSV file or data.frame) to a taxify()
result. Species names in the external data are matched through the same
backbone(s) used in the original taxify() call, and the join is performed
on accepted_id — so synonyms in either dataset resolve to the same key.
Usage
add_data(
x,
data,
species_col = NULL,
table = NULL,
sheet = NULL,
start_row = NULL,
cols = NULL,
group_col = NULL,
groups = "all",
fuzzy = TRUE,
fuzzy_threshold = 0.2,
verbose = TRUE
)
Arguments
x |
A data.frame returned by |
data |
One of:
|
species_col |
Character. Name of the column in |
table |
Character. Required when |
sheet |
Integer or character. Sheet to read when |
start_row |
Integer. Row where column headers begin in an |
cols |
Character vector of column names from |
group_col |
Character or |
groups |
Character vector or |
fuzzy |
Logical. Enable fuzzy matching for names in |
fuzzy_threshold |
Numeric. Maximum allowed distance for fuzzy matches.
Default |
verbose |
Logical. Default |
Details
The workflow:
Read
data(CSV or data.frame).Identify the species column (explicit or auto-detected).
Match species names through the same backbone(s) as the original
taxify()call, obtainingaccepted_idfor each row.Check for conflicting duplicates: if multiple rows in
dataresolve to the sameaccepted_idwith different values, an error is raised (unlessgroup_colis set). Exact duplicates produce a warning and are deduplicated.Left-join on
accepted_id.
Grouped data
When your data has multiple rows per species (e.g., one row per species
per country), set group_col to produce wide output with suffixed
columns. This is the same format as the built-in grouped enrichments.
Auto-detection
When species_col is not specified, add_data() takes the first 10 rows
of each character column and runs them through taxify(). The column with
the highest match rate is selected. If no column achieves at least 50%
matches, an error is raised asking the user to specify species_col
explicitly.
Value
The input data.frame with additional columns from data, joined
via backbone-resolved accepted_id. Columns from data that collide
with existing columns in x are prefixed with "data_".
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
result <- taxify(c("Quercus robur", "Pinus sylvestris"))
traits <- data.frame(species = c("Quercus robur", "Pinus sylvestris"),
height = c(30, 25))
result |> add_data(traits, species_col = "species")
options(old)
Add seed mass and plant height (Diaz et al. 2022)
Description
Joins species-level mean seed mass and plant height from Diaz et al.
(2022) to a taxify() result by looking up accepted_name.
Usage
add_diaz_traits(x, verbose = TRUE)
Arguments
x |
A data.frame returned by |
verbose |
Logical. Default |
Details
Source: Diaz et al. 2022, TRY File Archive (CC BY 3.0). Coverage: ~46k plant species. Plants only.
Value
The same data.frame with additional columns:
- seed_mass_mg
Seed mass in milligrams (species-level mean).
- plant_height_m
Plant height in metres (species-level mean).
References
Diaz S et al. (2022) The global spectrum of plant form and function: enhanced species-level trait data. TRY File Archive.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Quercus robur") |>
add_diaz_traits()
options(old)
Add British plant traits from Ecoflora
Description
Joins traits from the Ecological Flora of the British Isles (Fitter & Peat
1994) to a taxify() result by looking up accepted_name. Ecoflora covers
the vascular flora of the British Isles, providing canopy height, leaf
traits, life form, flowering phenology, pollination and reproduction, seed
weight, and British-calibrated Ellenberg indicator values. Every column
carries a _uk suffix to mark the British-flora calibration and to avoid
collisions when chained with other plant-trait enrichments
(e.g. add_baseflor() for France, add_floraweb() for Germany).
Usage
add_ecoflora(x, verbose = TRUE)
Arguments
x |
A data.frame returned by |
verbose |
Logical. Default |
Details
Source: Ecoflora (Ecological Flora of the British Isles). Ecoflora has no
bulk download or API; the bundled dataset was collected one species at a
time and is redistributed under the source licence (CC BY-NC-SA 4.0). The
.vtr is downloaded from the taxify release on first use and cached.
For French-flora traits see add_baseflor(); for German-flora traits see
add_floraweb(); for European-calibration indicator values see add_eive().
Value
The same data.frame with additional _uk columns:
- height_max_mm_uk, height_min_mm_uk
Canopy height range (mm).
- leaf_area_uk
Leaf area class.
- leaf_longevity_uk
Leaf longevity (e.g. evergreen, deciduous).
- root_system_uk
Root system type.
- photosynthetic_pathway_uk
Photosynthetic pathway (C3/C4/CAM).
- life_form_uk
Raunkiaer life form.
- reproduction_uk
Reproduction method.
- flower_begin_month_uk, flower_end_month_uk
Flowering months (1-12).
- pollination_vector_uk
Pollen vector(s).
- seed_weight_mg_uk
Seed weight (mg).
- propagule_uk
Propagule / dispersule type.
- ell_light_uk, ell_moisture_uk, ell_reaction_uk, ell_nitrogen_uk, ell_salt_uk
Ellenberg indicator values calibrated for the British flora (light, moisture, reaction, nitrogen, salt).
References
Fitter AH, Peat HJ (1994) The Ecological Flora Database. Journal of Ecology 82:415-425.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Bellis perennis") |>
add_ecoflora()
options(old)
Add EIVE ecological indicator values
Description
Joins EIVE 1.0 (Dengler et al. 2023) ecological indicator values to a
taxify() result by looking up accepted_name. EIVE provides
continuous indicator values for European vascular plants, superseding
the original ordinal Ellenberg values.
Usage
add_eive(x, verbose = TRUE)
Arguments
x |
A data.frame returned by |
verbose |
Logical. Default |
Details
Source: EIVE 1.0 (Dengler et al. 2023, Zenodo, CC BY 4.0). Coverage: ~14.5k European vascular plant species.
Value
The same data.frame with additional columns:
- eive_light
Light indicator value (continuous).
- eive_temperature
Temperature indicator value (continuous).
- eive_moisture
Moisture indicator value (continuous).
- eive_reaction
Soil reaction (pH) indicator value (continuous).
- eive_nutrients
Nutrient indicator value (continuous).
References
Dengler J et al. (2023) EIVE 1.0 – a standardized set of Ecological Indicator Values for Europe. Vegetation Classification and Survey 4:7-29. doi:10.3897/VCS.98324
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Arrhenatherum elatius") |>
add_eive()
options(old)
Add diet, foraging, and body mass (EltonTraits 1.0)
Description
Joins EltonTraits 1.0 diet composition, foraging strata, body mass,
and activity data to a taxify() result by looking up accepted_name.
Usage
add_elton_traits(x, verbose = TRUE)
Arguments
x |
A data.frame returned by |
verbose |
Logical. Default |
Details
Source: EltonTraits 1.0 (Wilman et al. 2014, Figshare, CC0). Coverage: ~15.4k species. Birds and mammals only.
Value
The same data.frame with additional columns:
- diet_inv
Percentage of diet: invertebrates.
- diet_vend
Percentage of diet: endothermic vertebrates.
- diet_vect
Percentage of diet: ectothermic vertebrates.
- diet_vfish
Percentage of diet: fish.
- diet_vunk
Percentage of diet: unknown vertebrates.
- diet_scav
Percentage of diet: scavenging.
- diet_fruit
Percentage of diet: fruit.
- diet_nect
Percentage of diet: nectar.
- diet_seed
Percentage of diet: seeds and nuts.
- diet_plantother
Percentage of diet: other plant material.
- foraging_water
Percentage of foraging: below water surface.
- foraging_ground
Percentage of foraging: on ground.
- foraging_understory
Percentage of foraging: in understory.
- foraging_midhigh
Percentage of foraging: in mid to high strata.
- foraging_canopy
Percentage of foraging: in canopy.
- foraging_aerial
Percentage of foraging: aerial.
- elton_body_mass_g
Body mass in grams.
- nocturnal
Nocturnal activity (0 = diurnal, 1 = nocturnal).
References
Wilman H et al. (2014) EltonTraits 1.0: Species-level foraging attributes of the world's birds and mammals. Ecology 95:2027.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Parus major", backend = "gbif") |>
add_elton_traits()
options(old)
Add freshwater fish morphological traits (FISHMORPH)
Description
Joins FISHMORPH morphological trait data to a taxify() result by
looking up accepted_name.
Usage
add_fish_traits(x, verbose = TRUE)
Arguments
x |
A data.frame returned by |
verbose |
Logical. Default |
Details
Source: FISHMORPH (Brosse et al. 2021, Figshare, CC BY 4.0). Coverage: ~8.3k freshwater fish species.
Value
The same data.frame with additional columns:
- fish_max_body_length
Maximum body length (cm).
- fish_body_elongation
Body elongation (body length / body depth).
- fish_vertical_eye_position
Vertical eye position (eye position / head depth).
- fish_relative_eye_size
Relative eye size (eye diameter / head length).
- fish_oral_gape_position
Oral gape position (mouth position: 0 = inferior, 0.5 = terminal, 1 = superior).
- fish_relative_maxillary_length
Relative maxillary length (maxillary length / head length).
- fish_body_lateral_shape
Body lateral shape (body depth / caudal peduncle depth).
- fish_pectoral_fin_position
Pectoral fin vertical position (fin insertion depth / body depth).
- fish_pectoral_fin_size
Pectoral fin size (fin length / body length).
- fish_caudal_peduncle_throttling
Caudal peduncle throttling (caudal peduncle depth / caudal fin depth).
References
Brosse S, Charpin N, Su G, Toussaint A, Herrera-R GA, Tedesco PA, Villegé r S (2021) FISHMORPH: A global database on morphological traits of freshwater fishes. Global Ecology and Biogeography 30:2330-2336. doi:10.1111/geb.13395
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Salmo trutta", backend = "gbif") |>
add_fish_traits()
options(old)
Add fish traits (FishBase)
Description
Joins FishBase morphological and ecological traits to a taxify() result
by looking up accepted_name.
Usage
add_fishbase(x, verbose = TRUE)
Arguments
x |
A data.frame returned by |
verbose |
Logical. Default |
Details
Source: FishBase via rfishbase (Froese & Pauly, CC BY-NC 3.0). Coverage: ~35k fish species. Fishes only.
The build-from-source fallback requires the rfishbase package
(available on CRAN). Pre-built .vtr files do not require rfishbase.
Value
The same data.frame with additional columns:
- fb_body_length_cm
Maximum body length in centimetres.
- fb_body_mass_g
Body mass in grams (estimated from length-weight relationships where available).
- fb_trophic_level
Trophic level.
- fb_depth_min_m
Minimum depth in metres.
- fb_depth_max_m
Maximum depth in metres.
- fb_vulnerability
Vulnerability index (0–100).
- fb_habitat
Habitat type (e.g. demersal, pelagic).
- fb_importance
Commercial importance category.
References
Froese R, Pauly D (eds.) (2024) FishBase. World Wide Web electronic publication, https://www.fishbase.org.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Gadus morhua", backend = "gbif") |>
add_fishbase()
options(old)
Add German plant traits from FloraWeb
Description
Joins traits from FloraWeb (Bundesamt fuer Naturschutz) to a taxify()
result by looking up accepted_name. FloraWeb is the live national portal
carrying the BiolFlor trait data (Klotz, Kuehn & Durka 2002) together with
Rothmaler morphology and Ellenberg indicator values. This enrichment covers
the full per-species trait profile scraped from the four FloraWeb trait
pages: morphology, reproductive biology, the nine Ellenberg indicator
values, ploidy and chromosome number, and chorological distribution. Every
column carries a _de suffix to mark the German-flora calibration and to
avoid collisions when chained with other plant-trait enrichments
(e.g. add_ecoflora() for Britain, add_baseflor() for France).
Usage
add_floraweb(x, verbose = TRUE)
Arguments
x |
A data.frame returned by |
verbose |
Logical. Default |
Details
Source: FloraWeb (https://www.floraweb.de/), Bundesamt fuer Naturschutz,
Bonn. FloraWeb has no bulk export or API; the bundled dataset was scraped
per species (accessed 2026-06-24) and that access date is the dataset
version. The trait data largely derive from BiolFlor, which per the BioFresh
metadata statement is publicly available and may be used without
restrictions provided it is acknowledged and cited correctly. The .vtr is
downloaded from the taxify release on first use and cached.
For British-flora traits see add_ecoflora(); for French-flora traits see
add_baseflor(); for European-calibration indicator values see add_eive().
Value
The same data.frame with German trait columns (all suffixed _de),
grouped as:
- Morphology
height_de,life_form_de,leaf_shape_de,leaf_anatomy_de,leaf_persistence_de,storage_organs_de,flowering_months_de,flowering_months_biolflor_de,flowering_phase_de,phenological_season_de,description_de.- Reproductive biology
pollination_vector_de,pollinator_de,pollinator_reward_de,flower_type_de,flower_class_de,dispersal_type_de,diaspore_type_de,germinule_type_de,reproduction_type_de,vegetative_spread_de,fertilization_type_de,apomixis_de,dicliny_de,dichogamy_de,self_incompatibility_de,si_mechanism_de,ploidy_de,chromosome_number_de,chromosome_freq_de,chromosomes_de.- Ecology
the nine Ellenberg indicator values
ell_light_de,ell_temperature_de,ell_continentality_de,ell_moisture_de,ell_moisture_variability_de,ell_reaction_de,ell_nitrogen_de,ell_salt_de,heavy_metal_resistance_de, plusstrategy_type_de(Grime CSR),habitat_site_de,formation_de,plant_community_de,biotope_type_de,forest_binding_de,hemeroby_de,urbanity_de.- Distribution
floristic_zones_de,areal_formula_de,areal_type_de,oceanity_de,range_centre_de,world_range_size_de,world_range_frequency_de,world_range_position_de,world_range_hazard_de,germany_range_share_de,germany_responsibility_de.
Categorical traits with several applicable values are joined with "; ". Trait values are German (as published by FloraWeb / BiolFlor).
References
Klotz S, Kuehn I, Durka W (2002) BIOLFLOR - Eine Datenbank zu biologisch-oekologischen Merkmalen der Gefaesspflanzen in Deutschland. Schriftenreihe fuer Vegetationskunde 38. Bundesamt fuer Naturschutz, Bonn.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Bellis perennis") |>
add_floraweb()
options(old)
Add fungal lifestyle and trait data (FungalTraits)
Description
Joins FungalTraits (Polme et al. 2020) genus-level trait data to a
taxify() result by looking up genus. Unlike other enrichments that
join on species-level accepted_name, FungalTraits is a genus-level
database and joins on the genus column already present in taxify output.
Usage
add_fungal_traits(x, verbose = TRUE)
Arguments
x |
A data.frame returned by |
verbose |
Logical. Default |
Details
Source: FungalTraits (Polme et al. 2020, Fungal Diversity, CC BY 4.0). Coverage: ~10k fungal genera. Genus-level only (not species-level).
Value
The same data.frame with additional columns:
- primary_lifestyle
Primary ecological role (e.g., saprotroph, mycorrhizal, pathogen, endophyte, lichenized, parasite).
- secondary_lifestyle
Secondary ecological role, if any.
- growth_form
Morphological growth form (e.g., agaricoid, corticioid, polyporoid, yeast).
- fruitbody_type
Fruiting body morphology (e.g., gasteroid, pileate, resupinate).
- decay_substrate
Substrate type for saprotrophic genera (e.g., wood, litter, dung, soil).
- plant_pathogenic_capacity
Capacity to cause plant disease (e.g., high, medium, low, none).
- animal_biotrophic_capacity
Capacity for animal biotrophy.
- endophytic_interaction_capability
Capacity for endophytic interactions with plants.
- ectomycorrhiza_exploration_type
Exploration type for ectomycorrhizal genera (e.g., contact, short, medium, long).
References
Polme S et al. (2020) FungalTraits: a user-friendly traits database of fungi and fungus-like stramenopiles. Fungal Diversity 105:1-16. doi:10.1007/s13225-020-00466-2
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Amanita muscaria", backend = "gbif") |>
add_fungal_traits()
options(old)
Add fungal functional guild data (FUNGuild)
Description
Joins FUNGuild trophic mode, guild, growth morphology, and confidence
data to a taxify() result by looking up accepted_name. Species-level
matches take priority; genus-level guild assignments are used as fallback
for unmatched species.
Usage
add_funguild(x, verbose = TRUE)
Arguments
x |
A data.frame returned by |
verbose |
Logical. Default |
Details
Source: FUNGuild (Nguyen et al. 2016, CC BY 4.0). Coverage: ~13k taxa. Fungi only.
The enrichment first attempts species-level matching. For species without a direct match, it falls back to genus-level guild assignments from FUNGuild's genus-rank entries.
Value
The same data.frame with additional columns:
- trophic_mode
Trophic mode (e.g., Pathotroph, Saprotroph, Symbiotroph, or hyphenated combinations).
- guild
Functional guild (e.g., "Ectomycorrhizal", "Plant Pathogen", "Wood Saprotroph").
- funguild_growth_form
Growth morphology (e.g., "Agaricoid", "Microfungus"). Prefixed to avoid collision with FungalTraits.
- confidence_ranking
Confidence of the guild assignment (Possible, Probable, Highly Probable).
References
Nguyen NH et al. (2016) FUNGuild: An open annotation tool for parsing fungal community datasets by ecological guild. Fungal Ecology 20:241-248.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Amanita muscaria", backend = "gbif") |>
add_funguild()
options(old)
Add GBIF-specific columns
Description
Joins extra GBIF backbone columns to a taxify() result by
looking up taxon_id in the GBIF backbone. Only enriches rows where
backend == "gbif".
Usage
add_gbif_info(x)
Arguments
x |
A data.frame returned by |
Value
The same data.frame with additional columns:
- notho_type
Hybrid type:
"GENERIC","SPECIFIC", or"INFRASPECIFIC".- nom_status
Nomenclatural status (may contain multiple values).
- bracket_authorship
Basionym author in parentheses.
- bracket_year
Basionym author year.
- gbif_year
Combining author year.
- name_published_in
Publication citation.
- origin
How the name entered the backbone.
- infra_specific_epithet
Infraspecific epithet.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Quercus robur", backend = "gbif") |>
add_gbif_info()
options(old)
Add naturalized alien flora status (GloNAF)
Description
Joins GloNAF (Global Naturalized Alien Flora) data to a taxify()
result, filtered by region.
Usage
add_glonaf(x, region, verbose = TRUE)
Arguments
x |
A data.frame returned by |
region |
Character. GloNAF region identifier(s), or
|
verbose |
Logical. Default |
Details
Source: GloNAF v2.0 (van Kleunen et al. 2019, Davis et al. 2025, CC BY 4.0). Coverage: ~16k alien plant taxa across ~1,300 regions. Plants only.
Value
The same data.frame with additional column(s):
- naturalized
Integer
1if the species is recorded as naturalized in that region,NAotherwise.
References
van Kleunen M et al. (2019) The Global Naturalized Alien Flora (GloNAF) database. Ecology 100:e02542.
Davis K et al. (2025) The updated Global Naturalized Alien Flora (GloNAF 2.0) database. Ecology, e70245.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Robinia pseudoacacia") |>
add_glonaf(region = "EUR")
taxify("Robinia pseudoacacia") |>
add_glonaf(region = c("EUR", "NAM"))
options(old)
Add hybrid parent and type information
Description
Parses the input_name column from a taxify() result to extract
hybrid parent names and classify the hybrid type.
Usage
add_hybrid_info(x)
Arguments
x |
A data.frame returned by |
Value
The same data.frame with additional columns:
- hybrid_parent_1
First parent (full binomial),
NAif not a hybrid formula.- hybrid_parent_2
Second parent (full binomial, abbreviated genus expanded),
NAif not a hybrid formula.- hybrid_type
One of
"nothogenus","nothospecies","formula", orNAif not a hybrid.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Quercus pyrenaica x Q. petraea") |>
add_hybrid_info()
options(old)
Add invasive species status
Description
Joins GRIIS (Global Register of Introduced and Invasive Species) data
to a taxify() result, filtered by country.
Usage
add_invasive_status(x, country, verbose = TRUE)
Arguments
x |
A data.frame returned by |
country |
Character. ISO 3166-1 alpha-2 country code(s), or
|
verbose |
Logical. Default |
Details
Source: GRIIS (Zenodo combined CSV, CC BY 4.0, 196 countries). Coverage: ~23k name x country combinations.
Value
The same data.frame with additional column(s):
- invasive_status
One of
"native","introduced","invasive", orNAif not recorded for that country.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Robinia pseudoacacia") |>
add_invasive_status(country = "AT")
taxify("Robinia pseudoacacia") |>
add_invasive_status(country = c("AT", "DE"))
options(old)
Add plant traits from LEDA Traitbase
Description
Joins LEDA Traitbase (Kleyer et al. 2008) plant functional traits to a
taxify() result by looking up accepted_name. LEDA provides species-level
trait data for NW European plant species, covering life form, dispersal,
seed, leaf, and clonality traits.
Usage
add_leda(x, verbose = TRUE)
Arguments
x |
A data.frame returned by |
verbose |
Logical. Default |
Details
Source: LEDA Traitbase (Kleyer et al. 2008). Coverage: ~8,000 NW European plant species.
The Raunkiaer life form is a bud-position classification system: phanerophyte = buds >25 cm above soil, chamaephyte = buds near soil surface, hemicryptophyte = buds at soil surface, geophyte (cryptophyte) = buds below soil, therophyte = annual that survives as seed.
Value
The same data.frame with additional columns:
- raunkiaer_life_form
Primary Raunkiaer life form classification (phanerophyte, chamaephyte, hemicryptophyte, geophyte, therophyte, helophyte, hydrophyte).
- raunkiaer_variable
1 if species assigned to multiple Raunkiaer forms, 0 otherwise.
- dispersal_type
Primary dispersal type (anemochory, zoochory, hydrochory, autochory, barochory, dysochory).
- terminal_velocity_ms
Seed terminal velocity in m/s (species median).
- seed_mass_mg
Seed mass in mg (species median). Prefixed with
leda_in the .vtr to avoid collision with Diaz traits.- canopy_height_m
Canopy height in meters (species median).
- leaf_mass_mg
Leaf dry mass in mg (species median).
- sla_mm2_mg
Specific leaf area in mm
^2/mg (species median).- clonal_growth
Capable of clonal growth (1 = yes, 0 = no).
- buoyancy
Seed buoyancy classification.
References
Kleyer M et al. (2008) The LEDA Traitbase: a database of life-history traits of the Northwest European flora. Journal of Ecology 96:1266-1274.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Arrhenatherum elatius") |>
add_leda()
options(old)
Add butterfly traits (LepTraits)
Description
Joins LepTraits 1.0 butterfly life-history and ecological traits to a
taxify() result by looking up accepted_name.
Usage
add_leptraits(x, verbose = TRUE)
Arguments
x |
A data.frame returned by |
verbose |
Logical. Default |
Details
Source: LepTraits 1.0 (Shirey et al. 2022, CC0). Coverage: ~12.4k butterfly species globally (Papilionoidea).
Value
The same data.frame with additional columns:
- wingspan_mm
Wingspan in mm (midpoint of lower and upper bounds).
- voltinism
Number of generations per year.
- diapause_stage
Overwintering/diapause life stage.
- canopy_affinity
Canopy association category.
- edge_affinity
Edge/gap affinity category.
- moisture_affinity
Moisture affinity category.
- disturbance_affinity
Disturbance affinity category.
- n_hostplant_families
Number of host plant families used.
- flight_months
Number of months with adult flight activity.
References
Shirey V et al. (2022) LepTraits 1.0: A globally comprehensive dataset of butterfly traits. Scientific Data 9:398.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Vanessa cardui", backend = "gbif") |>
add_leptraits()
options(old)
Add lizard life-history and ecological traits (Meiri 2018)
Description
Joins lizard trait data from Meiri (2018) to a taxify() result by
looking up accepted_name.
Usage
add_lizard_traits(x, verbose = TRUE)
Arguments
x |
A data.frame returned by |
verbose |
Logical. Default |
Details
Source: Meiri (2018, Global Ecology and Biogeography, CC BY 4.0). Coverage: ~6,600 lizard species. Lizards only.
Value
The same data.frame with additional columns:
- lizard_body_mass_g
Body mass in grams.
- lizard_svl_mm
Snout-vent length in mm.
- lizard_tail_length_mm
Tail length in mm.
- lizard_clutch_size
Clutch size.
- lizard_clutch_frequency
Clutches per year.
- lizard_longevity_yr
Maximum longevity in years.
- lizard_diet
Diet category.
- lizard_habitat
Habitat type.
- lizard_activity_time
Activity time (diurnal/nocturnal/crepuscular).
- lizard_foraging_mode
Foraging mode (sit-and-wait/active).
References
Meiri S (2018) Traits of lizards of the world: Variation around a successful evolutionary design. Global Ecology and Biogeography 27:1168-1172.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Pogona vitticeps", backend = "gbif") |>
add_lizard_traits()
options(old)
Add mammal life-history traits (PanTHERIA)
Description
Joins PanTHERIA mammal life-history and ecological traits to a
taxify() result by looking up accepted_name.
Usage
add_pantheria(x, verbose = TRUE)
Arguments
x |
A data.frame returned by |
verbose |
Logical. Default |
Details
Source: PanTHERIA (Jones et al. 2009, Ecological Archives, CC0). Coverage: ~5.4k mammal species. Mammals only.
Value
The same data.frame with additional columns:
- pantheria_body_mass_g
Adult body mass in grams.
- longevity_mo
Maximum longevity in months.
- litter_size
Litter size (mean).
- gestation_d
Gestation length in days.
- weaning_d
Weaning age in days.
- home_range_km2
Home range size in km
^2.- diet_breadth
Diet breadth (number of diet categories).
- habitat_breadth
Habitat breadth (number of habitat types).
References
Jones KE et al. (2009) PanTHERIA: a species-level database of life history, ecology, and geography of extant and recently extinct mammals. Ecology 90:2648.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Vulpes vulpes", backend = "gbif") |>
add_pantheria()
options(old)
Add Italian plant traits from Pignatti (on demand, via TR8)
Description
Fetches Italian Ellenberg-type indicator values, life form, and chorotype
from Pignatti's Flora d'Italia (Pignatti, Menegoni & Pietrosanti 2005) for
the species in a taxify() result, using the TR8 package, and joins them by
accepted_name. TR8 ships these values bundled, so this works offline.
Usage
add_pignatti(x, verbose = TRUE)
Arguments
x |
A data.frame returned by |
verbose |
Logical. Default |
Details
These values originate in a copyrighted publication, so taxify does not
redistribute them. This function reads the copy bundled in the suggested
package TR8 (which redistributes it under TR8's GPL, with attribution);
taxify ships none of it and no internet access is required. For
European-calibration indicator values see add_eive().
Value
The same data.frame with additional columns:
- light_it, temperature_it, continentality_it, moisture_it, reaction_it, nutrients_it, salinity_it
Ellenberg-type indicator values calibrated for the Italian flora (codes;
X= indifferent,0= not applicable).- life_form_it
Life form for the Italian flora.
- chorotype_it
Chorological type (distribution).
References
Pignatti S, Menegoni P, Pietrosanti S (2005) Bioindicazione attraverso le piante vascolari. Braun-Blanquetia 39. Bocci G (2015) TR8: an R package for easily retrieving plant species traits. Methods in Ecology and Evolution 6:347-350.
Examples
old <- options(taxify.data_dir = taxify_example_data())
# add_pignatti() fetches Italian trait data on demand via the TR8 package.
taxify("Abies alba") |>
add_pignatti()
options(old)
Add qualifier information
Description
Parses the input_name column from a taxify() result to extract
taxonomic qualifiers (cf., aff., s.l., etc.) and their positions.
Usage
add_qualifier_info(x)
Arguments
x |
A data.frame returned by |
Value
The same data.frame with additional columns:
- qualifier
The qualifier found (e.g.,
"cf.","aff."), orNAif none.- qualifier_position
Integer position (character index) of the qualifier in the original name, or
NAif none.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Pinus cf. sylvestris") |>
add_qualifier_info()
options(old)
Add WCVP native range status
Description
Joins WCVP (World Checklist of Vascular Plants, Kew) native range
data to a taxify() result, filtered by TDWG botanical region.
Usage
add_wcvp(x, region, verbose = TRUE)
Arguments
x |
A data.frame returned by |
region |
Character. TDWG Level 2 region code(s), or
|
verbose |
Logical. Default |
Details
Source: WCVP (Kew, CC BY). Coverage: ~340k plant species. Plants only.
Value
The same data.frame with additional column(s):
- native_status
One of
"native","introduced","extinct", orNAif not recorded for that region.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Quercus robur") |>
add_wcvp(region = "EUR")
taxify("Quercus robur") |>
add_wcvp(region = c("EUR", "NAM"))
options(old)
Add WFO-specific columns
Description
Joins extra World Flora Online columns to a taxify() result by
looking up taxon_id in the WFO backbone.
Usage
add_wfo_info(x)
Arguments
x |
A data.frame returned by |
Value
The same data.frame with additional columns:
- scientificNameID
WFO scientificNameID.
- parentNameUsageID
WFO parentNameUsageID.
- namePublishedIn
Publication reference.
- higherClassification
Higher classification string.
- taxonRemarks
Taxonomic remarks.
- infraspecificEpithet
Infraspecific epithet (for subspecies, varieties, forms).
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Quercus robur") |>
add_wfo_info()
options(old)
Add woodiness classification
Description
Joins woodiness data from Zanne et al. (2014) to a taxify() result
by looking up accepted_name.
Usage
add_woodiness(x, verbose = TRUE)
Arguments
x |
A data.frame returned by |
verbose |
Logical. Default |
Details
Source: Zanne et al. 2014, Nature (Dryad, CC0). Coverage: ~50k plant species. Plants only.
Value
The same data.frame with an additional column:
- woodiness
One of
"woody","herbaceous","variable", orNAif not in the dataset.
References
Zanne AE et al. (2014) Three keys to the radiation of angiosperms into freezing environments. Nature 506:89-92.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
taxify("Quercus robur") |>
add_woodiness()
options(old)
Cite data sources used in a taxify result
Description
Prints formatted citations for the taxonomic backbone(s), enrichment layers, and the taxify package itself. Optionally writes a BibTeX file.
Usage
cite(x, file = NULL)
Arguments
x |
A |
file |
Optional file path. If provided, BibTeX entries are written
to this file (extension should be |
Value
x, invisibly (pipe-friendly).
Examples
result <- taxify("Quercus robur", backend = "wfo")
result |> cite()
result |> cite(file = tempfile(fileext = ".bib"))
Embed accepted taxon info at build time (synonym self-join)
Description
Used by the taxifydb build pipeline and by taxify's own test fixtures.
For every synonym row, resolves the accepted taxon and embeds its name,
family, genus, and (when authorship_col is supplied) authorship directly.
Handles synonym chains by iterating until stable (max 10 hops).
Usage
embed_accepted(
df,
id_col,
acc_id_col,
name_col,
family_col,
genus_col,
status_col,
synonym_pattern = "SYNONYM",
authorship_col = NULL
)
Arguments
df |
The full backbone data.frame. |
id_col |
Name of the taxon ID column. |
acc_id_col |
Name of the accepted name usage ID column. |
name_col |
Name of the canonical name column. |
family_col |
Name of the family column. |
genus_col |
Name of the genus column. |
status_col |
Name of the taxonomic status column. |
synonym_pattern |
Regex pattern to detect synonyms in status column. |
authorship_col |
Optional name of the authorship column. When supplied,
the resolved accepted name's authorship is embedded as
|
Value
The data.frame with added columns: accepted_name, accepted_family, accepted_genus, accepted_taxon_id, accepted_authorship, is_synonym.
Export a taxify result to file
Description
Writes a taxify() result (with any enrichments) to disk in one of several
formats. The default .vtr format preserves column types and is fast to
re-read with add_data().
Usage
export_data(x, path, overwrite = FALSE)
Arguments
x |
A data.frame returned by |
path |
Character. Output file path. The format is inferred from the
extension: |
overwrite |
Logical. Overwrite an existing file? Default |
Value
Invisibly returns path.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
result <- taxify(c("Quercus robur", "Pinus sylvestris"))
result |> export_data(tempfile(fileext = ".vtr"))
result |> export_data(tempfile(fileext = ".csv"))
result |> export_data(tempfile(fileext = ".tsv"))
options(old)
List available enrichments
Description
Returns a summary of all enrichment layers available in the taxify manifest, including version, row count, whether the dataset is static, and which trait columns are provided.
Usage
list_enrichments(verbose = TRUE)
Arguments
verbose |
Logical. Default |
Value
A data.frame with columns: name, version, nrow, static,
trait_cols (comma-separated), and source_url.
Examples
list_enrichments()
Look up a genus in the register
Description
Returns the register row for the given genus, or NULL if not found.
Auto-loads the register on first call.
Usage
lookup_genus(genus)
Arguments
genus |
Character scalar. The genus name to look up. |
Value
A one-row data.frame, or NULL if the genus is not in the register.
Vectorized Latin orthographic normalization
Description
Reduces common Latin spelling alternations to a canonical form so that
e.g. hirtaeformis and hirtiformis produce the same normalized key.
Applied identically to both query names and backbone names so the keys
line up on either side of the join.
Usage
normalize_epithets(names)
Arguments
names |
Character vector of cleaned taxonomic names (genus + epithet). |
Details
Pipeline:
Lowercase.
Strip Latin-1 diacritics and ligatures (e-acute to e, ae-ligature to ae, sharp-s to ss, etc.), applied to genus and epithet.
Orthographic alternation on the epithet only:
ae/oe->i, trailingii->i,y->i,ph->f,rh->r,th->t.
Step 2 runs before step 3, so ae-ligature -> ae -> i and oe-ligature
-> oe -> i fold into the same key as the de-ligatured forms.
Value
Character vector of normalized forms.
Precompute matching keys at build time
Description
Used by the taxifydb build pipeline and by taxify's own test fixtures.
Adds key_ci, key_normalized, key_species, and fuzzy_block columns
to the backbone data.frame for direct lookup at query time.
Usage
precompute_keys(df, name_col, genus_col, epithet_col)
Arguments
df |
The backbone data.frame. |
name_col |
Name of the canonical name column. |
genus_col |
Name of the genus column. |
epithet_col |
Name of the specific epithet column. |
Value
The data.frame with added key columns.
Print a taxify_result
Description
Delegates to the standard data.frame print method.
Usage
## S3 method for class 'taxify_result'
print(x, ...)
Arguments
x |
A |
... |
Passed to the next method. |
Value
x, invisibly.
Score match candidates by resolution priority
Description
Computes the per-row priority scores used to rank backbone candidates for a
name (smaller is better): ACCEPTED over SYNONYM (status_score), SPECIES
over higher ranks (rank_score), nomenclaturally Valid (valid_score),
and epithet-preserving accepted target (epithet_score, the homotypic
basionym among same-name homonym synonyms, e.g. Pinus abies ->
Picea abies). Used by the matching engine's best-match selection and, in
the taxifydb build pipeline, to collapse each backbone key to the single
accepted name taxify() resolves it to.
Usage
score_candidates(candidates)
Arguments
candidates |
A data.frame with |
Value
A list with integer vectors status_score, rank_score,
valid_score, epithet_score, and the character tier signature
("status/rank/valid/epithet") per row, in input order.
Summarise a taxify_result
Description
Prints a compact digest of match quality and life-form scope to the console.
Uses cat() so output is captured by capture.output() and rendered
correctly in knitr chunks.
Usage
## S3 method for class 'taxify_result'
summary(object, ...)
Arguments
object |
A |
... |
Ignored. |
Value
object, invisibly.
Match taxonomic names against local backbone databases
Description
Matches a vector of taxonomic names against locally stored Darwin Core backbone databases. Returns a data.frame with one row per input name containing the matched name, accepted name, taxonomic hierarchy, and match quality information.
Usage
taxify(
x,
backend = "wfo",
fuzzy = TRUE,
fuzzy_threshold = 0.2,
fuzzy_method = c("dl", "levenshtein", "jw"),
verbose = TRUE
)
Arguments
x |
Character vector of taxonomic names. |
backend |
Character vector of backend names (e.g., |
fuzzy |
Logical. Enable fuzzy matching for names that fail exact
match. Default |
fuzzy_threshold |
Numeric. Maximum allowed distance for fuzzy matches. Two modes depending on the value:
|
fuzzy_method |
Character. One of |
verbose |
Logical. Print progress messages. Default |
Details
When multiple backends are specified, names are matched against each backend in order. Names matched by an earlier backend are not re-matched by later ones (fallback chain).
Value
A data.frame with one row per input name and the following columns:
- input_name
The original name as provided.
- matched_name
Full name in the backbone that matched.
- accepted_name
Resolved accepted name (equals
matched_nameif not a synonym).- taxon_id
Backend-specific ID of the matched name.
- accepted_id
ID of the accepted name.
- rank
Taxonomic rank (species, subspecies, genus, etc.).
- family
Family name.
- genus
Genus name.
- epithet
Specific epithet.
- authorship
Authorship of the matched name.
- accepted_authorship
Authorship of the accepted name. For a synonym this is the author of the resolved accepted name, not the synonym's own author, so
accepted_nameandaccepted_authorshiptogether form the accepted name's full citation.- is_synonym
Logical. Was the match a synonym?
- is_hybrid
Logical. Was a hybrid marker detected in the input?
- match_type
One of
"exact","exact_ci","fuzzy","abbrev"(an abbreviated genus such as"Q. robur"resolved via genus initial plus epithet), or"none".- fuzzy_dist
Normalized string distance (0–1),
NAif exact.- is_ambiguous
Logical.
TRUEwhen the matched scientificName had multiple synonym rows pointing to different accepted taxa at the same priority tier (homonym ambiguity). Disambiguated vianomenclaturalStatus = "Valid"when that column is in the backbone; for irreducible ambiguity, the scalar columns hold one candidate.- ambiguous_targets
Character.
|-joined list of conflicting accepted taxon IDs whenis_ambiguous = TRUE;NAotherwise.- backend
Which backend was used (e.g.,
"wfo","col","gbif").- backbone_version
Backend name, version, and download date (e.g.,
"wfo:2024-12 (2026-04-01)"). Useful for reproducibility.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
# Match a few names
taxify(c("Quercus robur", "Pinus sylvestris"))
# Disable fuzzy matching
taxify("Quercus robus", fuzzy = FALSE)
# Fallback chain: try WFO first, then COL for unmatched
taxify(c("Quercus robur", "Panthera leo"),
backend = c("wfo", "col"))
options(old)
Clear all cached backbones
Description
Removes all loaded backbone handles from memory. The next call to
taxify() will re-load from disk.
Usage
taxify_clear_cache()
Value
No return value, called for side effects.
Get the taxify data directory
Description
Returns the directory where taxify stores downloaded backbone and
enrichment .vtr files. By default this is the platform-appropriate
per-user cache returned by tools::R_user_dir() (available since R 4.0).
Usage
taxify_data_dir()
Details
The location can be overridden, in order of precedence, by the
taxify.data_dir option (getOption("taxify.data_dir")) or the
TAXIFY_DATA_DIR environment variable. This is useful to point taxify at
a shared cache, or at the small bundled example database returned by
taxify_example_data().
Value
Character string. Path to the data directory.
Download a backbone database
Description
Downloads the latest Darwin Core snapshot for the specified backend and
converts it to vectra's .vtr format for fast repeated queries.
Usage
taxify_download(backend, dest = NULL, verbose = TRUE, ...)
Arguments
backend |
A |
dest |
Character. Destination directory. Defaults to
|
verbose |
Logical. Print progress messages. |
... |
Additional arguments passed to methods. |
Details
Always re-downloads the latest release, overwriting any existing backbone.
Use taxify() for day-to-day matching — it auto-downloads on first use
and reuses the local copy thereafter.
Value
The path to the .vtr file (invisibly).
Download one or more enrichment .vtr files
Description
Downloads pre-built enrichment .vtr files from the taxify manifest.
Usage
taxify_download_enrichment(enrichment, version = "latest", verbose = TRUE)
Arguments
enrichment |
Character. One or more enrichment names (e.g.,
|
version |
Character. |
verbose |
Logical. Default |
Details
Available enrichments:
- conservation_status
IUCN conservation status (LC/NT/VU/EN/CR/EW/EX)
- griis
GRIIS invasive species status by country
- woodiness
Zanne et al. 2014 woody/herbaceous classification
- wcvp
WCVP native range by TDWG botanical region
- eive
EIVE 1.0 ecological indicator values (European plants)
- diaz_traits
Diaz et al. 2022 seed mass and plant height
- elton_traits
EltonTraits 1.0 diet and foraging (birds + mammals)
- avonet
AVONET bird morphology and migration
- pantheria
PanTHERIA mammal life-history traits
- common_names
GBIF vernacular names (multi-language)
- amphibio
AmphiBIO amphibian life-history and ecological traits
- leda
LEDA Traitbase NW European plant traits (Kleyer et al. 2008)
Value
The path(s) to the downloaded .vtr file(s) (invisibly).
Download a taxify backbone
Description
Downloads a pre-built .vtr backbone from Zenodo using the taxify manifest.
Progress is always shown. No prompts are shown — calling this function is
consent.
Usage
taxify_download_vtr(backend = "wfo", version = "latest", verbose = TRUE)
Arguments
backend |
Character. One of |
version |
Character. |
verbose |
Logical. Default |
Value
The path(s) to the downloaded .vtr file(s) (invisibly).
Path to the bundled example database
Description
taxify ships a tiny example database (a handful of species per backbone plus matching enrichment tables) so that examples and quick experiments run offline, without downloading the full multi-million-row backbones.
Usage
taxify_example_data()
Details
Point taxify at it for the current session by setting the
taxify.data_dir option:
old <- options(taxify.data_dir = taxify_example_data())
taxify("Quercus robur") |> add_woodiness()
options(old) # restore the real data directory
The example database is read-only and covers only the species used in the package examples; use the full downloaded backbones for real work.
Value
Character string. Path to the bundled example database directory,
or "" if it is not installed.
See Also
Load the unified genus register into memory
Description
Reads genus_register.vtr from disk and caches it as a data.frame in
.taxify_env$register. Subsequent calls reuse the cached version unless
force = TRUE.
Usage
taxify_load_register(force = FALSE, verbose = TRUE)
Arguments
force |
Logical. If |
verbose |
Logical. Print progress messages. Default |
Details
The register contains one row per genus with columns: genus, kingdom,
phylum, class, order, family, life_form.
Value
The register data.frame (invisibly).
Reshape grouped enrichment columns to long format
Description
Converts wide-format columns produced by grouped enrichments (e.g.,
invasive_status_AT, invasive_status_DE) back to long format with
one row per species x group combination.
Usage
taxify_long(x, cols = NULL, group_col = NULL, drop_na = FALSE)
Arguments
x |
A data.frame, typically a |
cols |
Character vector of base column names to reshape. These are
the column names without the group suffix (e.g., |
group_col |
Character. Name for the output group column.
If omitted, auto-detected from enrichment metadata (e.g.,
|
drop_na |
Logical. If |
Details
When cols and group_col are omitted, taxify_long() reads the
reshape metadata attached by grouped enrichment functions
(add_invasive_status(), add_alien_first_records(), add_wcvp(),
add_common_names()). If multiple grouped enrichments were applied,
all are reshaped together (they must share the same group column).
Column matching uses the explicit base names in cols to avoid ambiguity.
For example, given cols = c("alien_first_record", "alien_first_record_source"), the column alien_first_record_source_AT
is correctly matched to base alien_first_record_source (not
alien_first_record with suffix source_AT), because longer base names
are matched first.
If the columns in x exactly match cols (no suffixed variants), the
data is already in single-group format. In this case, the data.frame is
returned unchanged with group_col set to NA.
Value
A data.frame in long format. All columns from x that are not
part of the reshape are preserved. The reshaped columns use their base
names (without suffix), and a new group_col column contains the
group code extracted from the suffix.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
# Auto-detected: no cols or group_col needed
taxify("Robinia pseudoacacia") |>
add_alien_first_records(country = c("AT", "DE")) |>
taxify_long()
# Explicit: override auto-detection
taxify("Robinia pseudoacacia") |>
add_invasive_status(country = c("AT", "DE")) |>
taxify_long(cols = "invasive_status", group_col = "country")
options(old)
Invalidate the session manifest cache
Description
Forces the next fetch_manifest() call to re-fetch from the network.
Useful after the maintainer updates the manifest between R sessions without
restarting R.
Usage
taxify_refresh_manifest()
Value
No return value, called for side effects.
Show backend coverage for a genus
Description
Queries backend_coverage.vtr to determine which backends contain the
given genus, along with the backbone version at time of indexing.
Usage
taxify_register_coverage(genus)
Arguments
genus |
Character scalar. The genus name to query. |
Value
A data.frame with columns genus, backend, version,
date_added. Returns a zero-row data.frame if the genus is not found
in any backend.