agriReg provides a streamlined workflow for the most
common regression tasks in agricultural research:
| Task | Function |
|---|---|
| Clean field data | clean_agri_data() |
| Normalise yield | yield_normalize() |
| Flag outliers | outlier_flag() |
| Linear / mixed model | fit_linear() |
| Polynomial selection | fit_polynomial() |
| Nonlinear growth curves | fit_nonlinear() |
| Optimum dose / rate | optimum_dose() |
| Dose-response (LL.4/5) | fit_dose_response() |
| Effective dose (ED50…) | ed_estimates() |
| Compare models | compare_models() |
| Goodness-of-fit stats | gof_stats() |
| Residual diagnostics | residual_check() |
Load one of the bundled example datasets and inspect it.
wheat <- load_example_data("wheat_trial")
head(wheat)
#> block nitrogen phosphorus variety yield biomass
#> 1 A 0 low V1 3.362 4.935
#> 2 A 0 low V2 3.746 5.826
#> 3 A 0 high V1 3.799 5.679
#> 4 A 0 high V2 4.239 6.296
#> 5 A 60 low V1 5.408 7.625
#> 6 A 60 low V2 5.381 7.965wheat_clean <- clean_agri_data(wheat,
yield_col = "yield",
flag_outliers = TRUE)
summary(wheat_clean)
#> agriData summary -- response: 'yield'
#>
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 3.343 5.041 6.097 5.719 6.703 7.328
#>
#> Outliers flagged: 0 / 48Normalise yield within each block using z-scoring:
wheat_norm <- yield_normalize(wheat_clean,
yield_col = "yield",
method = "zscore",
group_by = "block")
head(wheat_norm[, c("block", "yield", "yield_norm")])
#> agriData object
#> Rows : 6
#> Columns : 3
#>
#> First 6 rows:
#> block yield yield_norm
#> 1 A 3.362 -1.8258491
#> 2 A 3.746 -1.5347896
#> 3 A 3.799 -1.4946173
#> 4 A 4.239 -1.1611116
#> 5 A 5.408 -0.2750475
#> 6 A 5.381 -0.2955126The standalone outlier_flag() supports three
methods:
wheat_clean <- outlier_flag(wheat_clean, col = "yield", method = "modz")
table(wheat_clean$yield_outlier)
#>
#> FALSE
#> 48m_lm <- fit_linear(wheat_clean, "yield ~ nitrogen + phosphorus")
print(m_lm)
#> ==============================
#> agriReg: Linear Model (agriLM)
#> ==============================
#> Engine : lm
#> Formula : yield ~ nitrogen + phosphorus
#>
#> Coefficients:
#> (Intercept) nitrogen phosphoruslow
#> 4.4240083 0.0167675 -0.4274167
#>
#> R2: 0.8784 | Adj-R2: 0.8701 | RMSE: 0.4261The plot() method returns ggplot2 objects —
pipe them, save them, or patch them together with
patchwork.
plot(m_lm, type = "fit")plot(m_lm, type = "residuals")plot(m_lm, type = "qq")residual_check(m_lm)Add a random intercept per block to account for field heterogeneity:
m_lmer <- fit_linear(wheat_clean,
formula = "yield ~ nitrogen + phosphorus + variety",
random = "(1|block)")
summary(m_lmer)
#> ==============================
#> agriReg: Linear Model Summary
#> ==============================
#> Engine : lmer
#>
#> Formula: yield ~ nitrogen + phosphorus + variety
#> Random : (1|block)
#>
#> Fixed effects:
#> (Intercept) nitrogen phosphoruslow varietyV2
#> 4.2722167 0.0167675 -0.4274167 0.3035833
#>
#> Random effects (variance components):
#> grp var1 vcov sdcor
#> block (Intercept) 0.000000 0.0000
#> Residual <NA> 0.172902 0.4158
#>
#> Number of obs: 48 | Groups: block, Residual
#>
#> --- Goodness-of-fit ---
#> R2 : 0.8938
#> Adj-R2 : 0.8915
#> RMSE : 0.3981
#> MAE : 0.3771
#> AIC : 80.77
#> BIC : 92.00Automatically test degrees 1–3 and return the best-fitting model:
m_poly <- fit_polynomial(wheat_clean,
x_col = "nitrogen",
y_col = "yield",
max_degree = 3)
coef(m_poly)
#> (Intercept) I(nitrogen^1) I(nitrogen^2)
#> 3.8360916667 0.0354779167 -0.0001039468Load the maize growth dataset:
maize <- load_example_data("maize_growth")
# Use well-watered plants only
maize_ww <- maize[maize$treatment == "WW", ]
head(maize_ww)
#> plant_id days biomass_g treatment
#> 1 1 1 0.00 WW
#> 2 1 2 10.98 WW
#> 3 1 3 0.00 WW
#> 4 1 4 14.10 WW
#> 5 1 5 0.00 WW
#> 6 1 6 0.28 WWThe 3-parameter logistic is the most common growth curve in agronomy. Parameter interpretation:
m_log <- fit_nonlinear(maize_ww,
x_col = "days",
y_col = "biomass_g",
model = "logistic")
summary(m_log)
#> ==================================
#> agriReg: Nonlinear Model Summary
#> ==================================
#> Model : logistic
#>
#>
#> Formula: biomass_g ~ Asym/(1 + exp((xmid - days)/scal))
#>
#> Parameters:
#> Estimate Std. Error t value Pr(>|t|)
#> Asym 497.8621 2.1803 228.34 <2e-16 ***
#> xmid 47.0653 0.1869 251.81 <2e-16 ***
#> scal 8.5507 0.1569 54.52 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 23.27 on 497 degrees of freedom
#>
#> Number of iterations to convergence: 5
#> Achieved convergence tolerance: 2.749e-07
#>
#>
#> --- Goodness-of-fit ---
#> R2 : 0.9869
#> RMSE : 23.1983
#> MAE : 17.2210
#> AIC : 4571.02
#> BIC : 4587.88plot(m_log)The Gompertz is often a better fit for biological growth that is asymmetric around the inflection point:
m_gomp <- fit_nonlinear(maize_ww, "days", "biomass_g", "gompertz")
coef(m_gomp)Classic model for nutrient-response data where yield approaches a plateau:
m_asym <- fit_nonlinear(wheat_clean,
x_col = "nitrogen",
y_col = "yield",
model = "asymptotic")
plot(m_asym)The quadratic polynomial is widely used to find the economic optimum nitrogen rate:
m_quad <- fit_nonlinear(wheat_clean,
x_col = "nitrogen",
y_col = "yield",
model = "quadratic")
opt <- optimum_dose(m_quad)
cat(sprintf("Optimum nitrogen rate : %.1f kg/ha\n", opt["x_opt"]))
#> Optimum nitrogen rate : 170.7 kg/ha
cat(sprintf("Predicted max yield : %.2f t/ha\n", opt["y_max"]))
#> Predicted max yield : 6.86 t/haSuitable when yield increases linearly up to a critical input level and then plateaus (common for phosphorus response):
m_lp <- fit_nonlinear(wheat_clean,
x_col = "nitrogen",
y_col = "yield",
model = "linear_plateau")
cat("Critical point (plateau onset):", optimum_dose(m_lp)["x_opt"], "kg/ha\n")
#> Critical point (plateau onset): 94.86183 kg/haLoad the herbicide dataset:
herb <- load_example_data("herbicide_trial")
# Subset to one species
amaranth <- herb[herb$species == "Amaranth", ]The LL.4 model is the industry standard for herbicide dose-response:
\[y = c + \frac{d - c}{1 + \exp(b(\log(x) - \log(e)))}\]
where b = slope, c = lower asymptote, d = upper asymptote, e = ED50.
m_dr <- fit_dose_response(amaranth,
dose_col = "dose_g_ha",
resp_col = "weed_control_pct",
fct = "LL.4")
summary(m_dr)
#> ==================================
#> agriReg: Dose-Response Summary
#> ==================================
#> Model : LL.4
#> Dose : dose_g_ha
#> Response: weed_control_pct
#>
#>
#> Model fitted: Log-logistic (ED50 as parameter) (4 parms)
#>
#> Parameter estimates:
#>
#> Estimate Std. Error t-value p-value
#> b:(Intercept) 1.905897 0.095867 19.8807 < 2e-16 ***
#> c:(Intercept) 1.576654 0.799879 1.9711 0.05603 .
#> d:(Intercept) 97.986892 0.875390 111.9351 < 2e-16 ***
#> e:(Intercept) 82.259133 2.334523 35.2360 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error:
#>
#> 2.74034 (38 degrees of freedom)
#>
#> ED10 / ED50 / ED90:
#>
#> Estimated effective doses
#>
#> Estimate Std. Error Lower Upper
#> e:1:10 25.9720 1.7769 22.3749 29.5691
#> e:1:50 82.2591 2.3345 77.5331 86.9851
#> e:1:90 260.5332 15.7523 228.6443 292.4221
#> Estimate Std. Error Lower Upper
#> e:1:10 25.97199 1.776870 22.37490 29.56907
#> e:1:50 82.25913 2.334523 77.53314 86.98513
#> e:1:90 260.53321 15.752300 228.64435 292.42208plot(m_dr, log_dose = TRUE)ed_estimates(m_dr, respLev = c(10, 50, 90))
#>
#> Estimated effective doses
#>
#> Estimate Std. Error Lower Upper
#> e:1:10 25.9720 1.7769 22.3749 29.5691
#> e:1:50 82.2591 2.3345 77.5331 86.9851
#> e:1:90 260.5332 15.7523 228.6443 292.4221
#> Estimate Std. Error Lower Upper
#> e:1:10 25.97199 1.776870 22.37490 29.56907
#> e:1:50 82.25913 2.334523 77.53314 86.98513
#> e:1:90 260.53321 15.752300 228.64435 292.42208Compare all models fitted to the wheat data:
cmp <- compare_models(
ols = m_lm,
mixed = m_lmer,
asymptotic = m_asym,
quadratic = m_quad,
lp = m_lp
)
print(cmp)
#> model engine n_par AIC BIC RMSE MAE R2
#> 1 quadratic quadratic 3 27.105 34.590 0.2953 0.2419 0.9416
#> 2 lp linear_plateau 3 33.296 40.781 0.3149 0.2568 0.9335
#> 3 ols lm 3 62.315 69.800 0.4261 0.3753 0.8784
#> 4 mixed lmer 1 80.769 91.997 0.3981 0.3771 0.8938
#> 5 asymptotic asymptotic 2 205.438 211.051 1.9320 1.1392 -1.5010
#> delta_AIC
#> 1 0.000
#> 2 6.191
#> 3 35.210
#> 4 53.664
#> 5 178.333The table is ranked by AIC. delta_AIC shows the
difference from the best model — values below 2 are considered
equivalent; above 10 is strong evidence against the weaker model.
All agriReg objects are compatible with
broom for tidy output:
library(broom)
tidy(m_lm$fit)
#> # A tibble: 3 × 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 4.42 0.124 35.7 1.13e-34
#> 2 nitrogen 0.0168 0.000947 17.7 6.59e-22
#> 3 phosphoruslow -0.427 0.127 -3.36 1.57e- 3
glance(m_lm$fit)
#> # A tibble: 1 × 12
#> r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.878 0.873 0.440 162. 2.60e-21 2 -27.2 62.3 69.8
#> # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>p <- plot(m_log)
ggplot2::ggsave("logistic_growth.png", p, width = 7, height = 4.5, dpi = 300)sessionInfo()
#> R version 4.2.1 (2022-06-23 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 26200)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=C
#> [2] LC_CTYPE=English_United States.utf8
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.utf8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] broom_1.0.12 ggplot2_4.0.2 emmeans_2.0.2 agriReg_0.1.0
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.1.1 mvtnorm_1.3-6 lattice_0.20-45 tidyr_1.3.2
#> [5] gtools_3.9.5 reformulas_0.4.4 zoo_1.8-15 utf8_1.2.6
#> [9] digest_0.6.39 R6_2.6.1 backports_1.5.0 evaluate_1.0.5
#> [13] coda_0.19-4.1 pillar_1.11.1 Rdpack_2.6.6 rlang_1.1.7
#> [17] otel_0.2.0 multcomp_1.4-30 rstudioapi_0.18.0 minqa_1.2.8
#> [21] car_3.1-5 nloptr_2.2.1 jquerylib_0.1.4 Matrix_1.6-5
#> [25] rmarkdown_2.30 labeling_0.4.3 splines_4.2.1 lme4_2.0-1
#> [29] S7_0.2.1 compiler_4.2.1 xfun_0.57 pkgconfig_2.0.3
#> [33] mgcv_1.8-40 htmltools_0.5.9 tidyselect_1.2.1 tibble_3.3.1
#> [37] codetools_0.2-18 dplyr_1.2.0 withr_3.0.2 MASS_7.3-57
#> [41] rbibutils_2.4.1 grid_4.2.1 nlme_3.1-168 jsonlite_2.0.0
#> [45] xtable_1.8-8 gtable_0.3.6 lifecycle_1.0.5 magrittr_2.0.3
#> [49] scales_1.4.0 estimability_1.5.1 cli_3.6.5 cachem_1.1.0
#> [53] carData_3.0-6 farver_2.1.2 bslib_0.10.0 drc_3.0-1
#> [57] generics_0.1.4 vctrs_0.7.2 boot_1.3-28 sandwich_3.1-1
#> [61] Formula_1.2-5 TH.data_1.1-5 RColorBrewer_1.1-3 tools_4.2.1
#> [65] glue_1.7.0 purrr_1.0.2 plotrix_3.8-14 abind_1.4-8
#> [69] pbkrtest_0.5.5 parallel_4.2.1 fastmap_1.2.0 survival_3.3-1
#> [73] yaml_2.3.12 knitr_1.51 patchwork_1.3.2 sass_0.4.10