Getting started with agriReg

Overview

agriReg provides a streamlined workflow for the most common regression tasks in agricultural research:

Task	Function
Clean field data	`clean_agri_data()`
Normalise yield	`yield_normalize()`
Flag outliers	`outlier_flag()`
Linear / mixed model	`fit_linear()`
Polynomial selection	`fit_polynomial()`
Nonlinear growth curves	`fit_nonlinear()`
Optimum dose / rate	`optimum_dose()`
Dose-response (LL.4/5)	`fit_dose_response()`
Effective dose (ED50…)	`ed_estimates()`
Compare models	`compare_models()`
Goodness-of-fit stats	`gof_stats()`
Residual diagnostics	`residual_check()`

1 · Data preparation

Load one of the bundled example datasets and inspect it.

wheat <- load_example_data("wheat_trial")
head(wheat)
#>   block nitrogen phosphorus variety yield biomass
#> 1     A        0        low      V1 3.362   4.935
#> 2     A        0        low      V2 3.746   5.826
#> 3     A        0       high      V1 3.799   5.679
#> 4     A        0       high      V2 4.239   6.296
#> 5     A       60        low      V1 5.408   7.625
#> 6     A       60        low      V2 5.381   7.965

Cleaning and outlier detection

wheat_clean <- clean_agri_data(wheat,
                                yield_col     = "yield",
                                flag_outliers = TRUE)
summary(wheat_clean)
#> agriData summary -- response: 'yield'
#> 
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   3.343   5.041   6.097   5.719   6.703   7.328 
#> 
#> Outliers flagged: 0 / 48

Normalisation

Normalise yield within each block using z-scoring:

wheat_norm <- yield_normalize(wheat_clean,
                               yield_col = "yield",
                               method    = "zscore",
                               group_by  = "block")
head(wheat_norm[, c("block", "yield", "yield_norm")])
#> agriData object
#>   Rows      : 6
#>   Columns   : 3
#> 
#> First 6 rows:
#>   block yield yield_norm
#> 1     A 3.362 -1.8258491
#> 2     A 3.746 -1.5347896
#> 3     A 3.799 -1.4946173
#> 4     A 4.239 -1.1611116
#> 5     A 5.408 -0.2750475
#> 6     A 5.381 -0.2955126

Advanced outlier flagging

The standalone outlier_flag() supports three methods:

wheat_clean <- outlier_flag(wheat_clean, col = "yield", method = "modz")
table(wheat_clean$yield_outlier)
#> 
#> FALSE 
#>    48

2 · Linear models

Simple OLS

m_lm <- fit_linear(wheat_clean, "yield ~ nitrogen + phosphorus")
print(m_lm)
#> ==============================
#>  agriReg: Linear Model (agriLM)
#> ==============================
#> Engine  : lm 
#> Formula : yield ~ nitrogen + phosphorus 
#> 
#> Coefficients:
#>   (Intercept)      nitrogen phosphoruslow 
#>     4.4240083     0.0167675    -0.4274167 
#> 
#> R2: 0.8784  |  Adj-R2: 0.8701  |  RMSE: 0.4261

Diagnostic plots

The plot() method returns ggplot2 objects — pipe them, save them, or patch them together with patchwork.

plot(m_lm, type = "fit")

plot(m_lm, type = "residuals")

plot(m_lm, type = "qq")

Full residual panel

residual_check(m_lm)

Mixed-effects model

Add a random intercept per block to account for field heterogeneity:

m_lmer <- fit_linear(wheat_clean,
                     formula = "yield ~ nitrogen + phosphorus + variety",
                     random  = "(1|block)")
summary(m_lmer)
#> ==============================
#>  agriReg: Linear Model Summary
#> ==============================
#> Engine : lmer 
#> 
#> Formula: yield ~ nitrogen + phosphorus + variety 
#> Random : (1|block) 
#> 
#> Fixed effects:
#>   (Intercept)      nitrogen phosphoruslow     varietyV2 
#>     4.2722167     0.0167675    -0.4274167     0.3035833 
#> 
#> Random effects (variance components):
#>       grp        var1     vcov  sdcor
#>     block (Intercept) 0.000000 0.0000
#>  Residual        <NA> 0.172902 0.4158
#> 
#> Number of obs: 48 | Groups: block, Residual 
#> 
#> --- Goodness-of-fit ---
#> R2       : 0.8938
#> Adj-R2   : 0.8915
#> RMSE     : 0.3981
#> MAE      : 0.3771
#> AIC      : 80.77
#> BIC      : 92.00

Polynomial selection

Automatically test degrees 1–3 and return the best-fitting model:

m_poly <- fit_polynomial(wheat_clean,
                          x_col      = "nitrogen",
                          y_col      = "yield",
                          max_degree = 3)
coef(m_poly)
#>   (Intercept) I(nitrogen^1) I(nitrogen^2) 
#>  3.8360916667  0.0354779167 -0.0001039468

3 · Nonlinear models

Load the maize growth dataset:

maize <- load_example_data("maize_growth")
# Use well-watered plants only
maize_ww <- maize[maize$treatment == "WW", ]
head(maize_ww)
#>   plant_id days biomass_g treatment
#> 1        1    1      0.00        WW
#> 2        1    2     10.98        WW
#> 3        1    3      0.00        WW
#> 4        1    4     14.10        WW
#> 5        1    5      0.00        WW
#> 6        1    6      0.28        WW

Logistic growth

The 3-parameter logistic is the most common growth curve in agronomy. Parameter interpretation:

Asym: upper asymptote (maximum biomass)
xmid: inflection point (day of maximum growth rate)
scal: scale parameter (related to the steepness)

m_log <- fit_nonlinear(maize_ww,
                        x_col = "days",
                        y_col = "biomass_g",
                        model = "logistic")
summary(m_log)
#> ==================================
#>  agriReg: Nonlinear Model Summary
#> ==================================
#> Model : logistic 
#> 
#> 
#> Formula: biomass_g ~ Asym/(1 + exp((xmid - days)/scal))
#> 
#> Parameters:
#>      Estimate Std. Error t value Pr(>|t|)    
#> Asym 497.8621     2.1803  228.34   <2e-16 ***
#> xmid  47.0653     0.1869  251.81   <2e-16 ***
#> scal   8.5507     0.1569   54.52   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 23.27 on 497 degrees of freedom
#> 
#> Number of iterations to convergence: 5 
#> Achieved convergence tolerance: 2.749e-07
#> 
#> 
#> --- Goodness-of-fit ---
#> R2   : 0.9869
#> RMSE : 23.1983
#> MAE  : 17.2210
#> AIC  : 4571.02
#> BIC  : 4587.88

plot(m_log)

Gompertz curve

The Gompertz is often a better fit for biological growth that is asymmetric around the inflection point:

m_gomp <- fit_nonlinear(maize_ww, "days", "biomass_g", "gompertz")
coef(m_gomp)

Asymptotic (Mitscherlich) model

Classic model for nutrient-response data where yield approaches a plateau:

m_asym <- fit_nonlinear(wheat_clean,
                         x_col = "nitrogen",
                         y_col = "yield",
                         model = "asymptotic")
plot(m_asym)

Quadratic and optimum dose

The quadratic polynomial is widely used to find the economic optimum nitrogen rate:

m_quad <- fit_nonlinear(wheat_clean,
                         x_col = "nitrogen",
                         y_col = "yield",
                         model = "quadratic")

opt <- optimum_dose(m_quad)
cat(sprintf("Optimum nitrogen rate : %.1f kg/ha\n", opt["x_opt"]))
#> Optimum nitrogen rate : 170.7 kg/ha
cat(sprintf("Predicted max yield   : %.2f t/ha\n",  opt["y_max"]))
#> Predicted max yield   : 6.86 t/ha

Linear-plateau model

Suitable when yield increases linearly up to a critical input level and then plateaus (common for phosphorus response):

m_lp <- fit_nonlinear(wheat_clean,
                       x_col = "nitrogen",
                       y_col = "yield",
                       model = "linear_plateau")
cat("Critical point (plateau onset):", optimum_dose(m_lp)["x_opt"], "kg/ha\n")
#> Critical point (plateau onset): 94.86183 kg/ha

4 · Dose-response models

Load the herbicide dataset:

herb <- load_example_data("herbicide_trial")
# Subset to one species
amaranth <- herb[herb$species == "Amaranth", ]

Fit a 4-parameter log-logistic curve

The LL.4 model is the industry standard for herbicide dose-response:

\[y = c + \frac{d - c}{1 + \exp(b(\log(x) - \log(e)))}\]

where b = slope, c = lower asymptote, d = upper asymptote, e = ED50.

m_dr <- fit_dose_response(amaranth,
                           dose_col = "dose_g_ha",
                           resp_col = "weed_control_pct",
                           fct      = "LL.4")
summary(m_dr)
#> ==================================
#>  agriReg: Dose-Response Summary
#> ==================================
#> Model   : LL.4 
#> Dose    : dose_g_ha 
#> Response: weed_control_pct 
#> 
#> 
#> Model fitted: Log-logistic (ED50 as parameter) (4 parms)
#> 
#> Parameter estimates:
#> 
#>                Estimate Std. Error  t-value p-value    
#> b:(Intercept)  1.905897   0.095867  19.8807 < 2e-16 ***
#> c:(Intercept)  1.576654   0.799879   1.9711 0.05603 .  
#> d:(Intercept) 97.986892   0.875390 111.9351 < 2e-16 ***
#> e:(Intercept) 82.259133   2.334523  35.2360 < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error:
#> 
#>  2.74034 (38 degrees of freedom)
#> 
#> ED10 / ED50 / ED90:
#> 
#> Estimated effective doses
#> 
#>        Estimate Std. Error    Lower    Upper
#> e:1:10  25.9720     1.7769  22.3749  29.5691
#> e:1:50  82.2591     2.3345  77.5331  86.9851
#> e:1:90 260.5332    15.7523 228.6443 292.4221
#>         Estimate Std. Error     Lower     Upper
#> e:1:10  25.97199   1.776870  22.37490  29.56907
#> e:1:50  82.25913   2.334523  77.53314  86.98513
#> e:1:90 260.53321  15.752300 228.64435 292.42208

plot(m_dr, log_dose = TRUE)

Effective dose estimates

ed_estimates(m_dr, respLev = c(10, 50, 90))
#> 
#> Estimated effective doses
#> 
#>        Estimate Std. Error    Lower    Upper
#> e:1:10  25.9720     1.7769  22.3749  29.5691
#> e:1:50  82.2591     2.3345  77.5331  86.9851
#> e:1:90 260.5332    15.7523 228.6443 292.4221
#>         Estimate Std. Error     Lower     Upper
#> e:1:10  25.97199   1.776870  22.37490  29.56907
#> e:1:50  82.25913   2.334523  77.53314  86.98513
#> e:1:90 260.53321  15.752300 228.64435 292.42208

5 · Model comparison

Compare all models fitted to the wheat data:

cmp <- compare_models(
  ols        = m_lm,
  mixed      = m_lmer,
  asymptotic = m_asym,
  quadratic  = m_quad,
  lp         = m_lp
)
print(cmp)
#>        model         engine n_par     AIC     BIC   RMSE    MAE      R2
#> 1  quadratic      quadratic     3  27.105  34.590 0.2953 0.2419  0.9416
#> 2         lp linear_plateau     3  33.296  40.781 0.3149 0.2568  0.9335
#> 3        ols             lm     3  62.315  69.800 0.4261 0.3753  0.8784
#> 4      mixed           lmer     1  80.769  91.997 0.3981 0.3771  0.8938
#> 5 asymptotic     asymptotic     2 205.438 211.051 1.9320 1.1392 -1.5010
#>   delta_AIC
#> 1     0.000
#> 2     6.191
#> 3    35.210
#> 4    53.664
#> 5   178.333

The table is ranked by AIC. delta_AIC shows the difference from the best model — values below 2 are considered equivalent; above 10 is strong evidence against the weaker model.

6 · Extracting results for reports

All agriReg objects are compatible with broom for tidy output:

library(broom)
tidy(m_lm$fit)
#> # A tibble: 3 × 5
#>   term          estimate std.error statistic  p.value
#>   <chr>            <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)     4.42    0.124        35.7  1.13e-34
#> 2 nitrogen        0.0168  0.000947     17.7  6.59e-22
#> 3 phosphoruslow  -0.427   0.127        -3.36 1.57e- 3
glance(m_lm$fit)
#> # A tibble: 1 × 12
#>   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
#>       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
#> 1     0.878         0.873 0.440      162. 2.60e-21     2  -27.2  62.3  69.8
#> # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Saving plots

p <- plot(m_log)
ggplot2::ggsave("logistic_growth.png", p, width = 7, height = 4.5, dpi = 300)

Session information

sessionInfo()
#> R version 4.2.1 (2022-06-23 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 26200)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=C                          
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] broom_1.0.12  ggplot2_4.0.2 emmeans_2.0.2 agriReg_0.1.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.1.1         mvtnorm_1.3-6      lattice_0.20-45    tidyr_1.3.2       
#>  [5] gtools_3.9.5       reformulas_0.4.4   zoo_1.8-15         utf8_1.2.6        
#>  [9] digest_0.6.39      R6_2.6.1           backports_1.5.0    evaluate_1.0.5    
#> [13] coda_0.19-4.1      pillar_1.11.1      Rdpack_2.6.6       rlang_1.1.7       
#> [17] otel_0.2.0         multcomp_1.4-30    rstudioapi_0.18.0  minqa_1.2.8       
#> [21] car_3.1-5          nloptr_2.2.1       jquerylib_0.1.4    Matrix_1.6-5      
#> [25] rmarkdown_2.30     labeling_0.4.3     splines_4.2.1      lme4_2.0-1        
#> [29] S7_0.2.1           compiler_4.2.1     xfun_0.57          pkgconfig_2.0.3   
#> [33] mgcv_1.8-40        htmltools_0.5.9    tidyselect_1.2.1   tibble_3.3.1      
#> [37] codetools_0.2-18   dplyr_1.2.0        withr_3.0.2        MASS_7.3-57       
#> [41] rbibutils_2.4.1    grid_4.2.1         nlme_3.1-168       jsonlite_2.0.0    
#> [45] xtable_1.8-8       gtable_0.3.6       lifecycle_1.0.5    magrittr_2.0.3    
#> [49] scales_1.4.0       estimability_1.5.1 cli_3.6.5          cachem_1.1.0      
#> [53] carData_3.0-6      farver_2.1.2       bslib_0.10.0       drc_3.0-1         
#> [57] generics_0.1.4     vctrs_0.7.2        boot_1.3-28        sandwich_3.1-1    
#> [61] Formula_1.2-5      TH.data_1.1-5      RColorBrewer_1.1-3 tools_4.2.1       
#> [65] glue_1.7.0         purrr_1.0.2        plotrix_3.8-14     abind_1.4-8       
#> [69] pbkrtest_0.5.5     parallel_4.2.1     fastmap_1.2.0      survival_3.3-1    
#> [73] yaml_2.3.12        knitr_1.51         patchwork_1.3.2    sass_0.4.10