EMOTIONS: Ensemble Models fOr lacTatION curveS

Pablo A. S. Fonseca

Instituto de Ganadería de Montaña (CSIC-Univ. de León), 24346 Grulleros, León
p.fonseca@csic.es

Marcos Prates

Department of statistics, Universidade Federal de Minas Gerai, Brasil

Aroa Suárez-Vega

Dpto. Producción Animal, Facultad de Veterinaria, Universidad de León (24007)

Ruth Arribas Gonzalo

Dpto. Producción Animal, Facultad de Veterinaria, Universidad de León (24007)

Beatriz Gutierrez-Gil

Dpto. Producción Animal, Facultad de Veterinaria, Universidad de León (24007)

Juan José Arranz

Dpto. Producción Animal, Facultad de Veterinaria, Universidad de León (24007)

2026-01-28

library(EMOTIONS)

Introduction

The mathematical representation of lactation curves has significant applications in various areas of animal science. Models that simulate milk production under different conditions are valuable for physiologists, nutritionists, and geneticists, allowing them to study mammary gland function and test hypotheses. These models also support management decisions related to timing and efficiency. Over the past decades, numerous models have been proposed for representing lactation curves in dairy species. These models differ mainly in their regression type (linear or nonlinear), number of parameters, relationships among parameters, and ability to represent lactation patterns such as peak yield, time at peak, and persistency.

Despite the advantages of having a diverse range of models, selecting a single model to represent the lactation curve can present limitations. Typically, model selection is based on comparing metrics such as the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). However, this metric-based approach assumes that the chosen metric is an optimal criterion for model selection. In practice, these metrics may fail to correctly identify the best model, especially when no single model is clearly superior. Furthermore, using a single model selected based on these metrics can lead to overfitting and introduce bias, particularly when the dataset is noisy or contains numerous variables.

Ensemble modeling and model averaging are powerful techniques that enhance robustness, accuracy, and generalization in predictive modeling. Instead of relying on a single model, ensemble methods combine multiple models to reduce variance, mitigate overfitting, and improve predictive performance. Model averaging incorporates model uncertainty by weighting predictions according to their posterior probabilities, leading to more reliable estimates. These approaches are particularly valuable when individual models exhibit varying performance across different datasets or conditions. By integrating multiple models, ensemble methods improve stability and resilience, making them especially useful in complex biological and ecological systems where uncertainty quantification is crucial.

The EMOTIONS package provides a set of tools for fitting 47 different lactation curve models previously reported in the literature. Some of these models and the pre-defined starting parameters were obtained from the lactcurves R package (https://CRAN.R-project.org/package=lactcurves). Once the data is fitted to each model, ensemble predictions are generated using bagging based on AIC, BIC, root mean square percentage error (RMSPE), mean absolute error (MAE), and variance. Additionally, the package provides predictions for daily milk records using Bayesian Model Averaging (BMA) and calculates cosine similarity for each model’s predictions. The ranking of models across individual predictions can be visualized using the RidgeModels and ModelRankRange functions, which help users better understand the weight assigned to each model. Furthermore, the PlotWeightLac function allows users to compare predicted and actual daily milk records. Lastly, EMOTIONS enables the estimation of resilience indicators based on lag-1 autocorrelation, logarithm of residual variance, and residual skewness using predicted daily milking records.

Installing the Package

Use the following code to install EMOTIONS:

install.packages("EMOTIONS")

Analysis

Loading the Package

library(EMOTIONS)

Input Data

EMOTIONS includes a dummy dataset containing daily milk records (up to 210 days) for 100 unique individuals. This dataset can be accessed as follows:

# Load the dummy dataset
data("LacData")

# Display the first rows
head(LacData)
#>    ID DIM      DMY
#> 1 ID2   1 0.920000
#> 2 ID2   2 1.533911
#> 3 ID2   3 1.720438
#> 4 ID2   4 1.968925
#> 5 ID2   5 2.201476
#> 6 ID2   6 2.339388

The dataset consists of three columns:

  • ID: Unique individual identifier
  • DIM: Days in milk
  • DMY: Daily milk yield

Fitting Lactation Curve Models and Generating Ensembles

The core function of EMOTIONS is LacCurveFit. This wrapper function integrates multiple supplementary functions that fit 47 lactation curve models. It takes daily milk production and days in milk from LacData as input. The following arguments must be provided:

  • data: Data frame containing daily milking records
  • ID: Column name containing unique individual IDs
  • trait: Column name containing daily milking records
  • dim: Column name containing days in milk records
  • alpha: Penalization factor (ranging from 0 to 1) for model weight estimation (this parameter should be fine-tuned based on predictions)
  • models: Vector specifying models to include. The default is “All”, which includes all 47 models. Alternatively, users can specify a subset of models.
  • param_list: List specifying model parameters (optional)

Example usage:

# Running model fitting and ensemble modeling
out.ensemble <- LacCurveFit(
  data = LacData, ID = "ID", trait = "DMY",
  dim = "DIM", alpha = 0.1,
  models = "All", param_list = NULL, silent=TRUE
)

The output of LacCurveFit consists of three main lists:

  1. converged_models: Stores the converged models for each individual.
  2. models_weight: Stores model weights and ranks for all ensemble methods.
  3. production: Stores predicted daily milk records for all ensemble methods.

Checking the first six fitted models for individual ID2:

head(out.ensemble$converged_models$ID2)
#> $MM
#> Nonlinear regression model
#>   model: DMY ~ (a * DIM)/(b + DIM)
#>    data: x
#>      a      b 
#>  1.678 -1.426 
#>  residual sum-of-squares: 102.4
#> 
#> Number of iterations to convergence: 16 
#> Achieved convergence tolerance: 8.24e-06
#> 
#> $MMR
#> Nonlinear regression model
#>   model: DMY ~ 1/(1 + a/(b + DIM))
#>    data: x
#>        a        b 
#>  -0.4402 -11.2975 
#>  residual sum-of-squares: 222
#> 
#> Number of iterations to convergence: 10 
#> Achieved convergence tolerance: 7.544e-06
#> 
#> $brody23
#> Nonlinear regression model
#>   model: DMY ~ a * exp(-b * DIM)
#>    data: x
#>        a        b 
#> 2.472794 0.002908 
#>  residual sum-of-squares: 40.12
#> 
#> Number of iterations to convergence: 5 
#> Achieved convergence tolerance: 1.865e-06
#> 
#> $brody24
#> Nonlinear regression model
#>   model: DMY ~ a * exp(-b * DIM) - a * exp(-c * DIM)
#>    data: x
#>        a        b        c 
#> 2.608615 0.003347 0.409834 
#>  residual sum-of-squares: 35.6
#> 
#> Number of iterations to convergence: 6 
#> Achieved convergence tolerance: 1.759e-06
#> 
#> $SCH
#> Nonlinear regression model
#>   model: DMY ~ exp(a + b/DIM)
#>    data: x
#>      a      b 
#> 0.6086 0.0746 
#>  residual sum-of-squares: 61.22
#> 
#> Number of iterations to convergence: 24 
#> Achieved convergence tolerance: 5.521e-06
#> 
#> $SCHL
#> Nonlinear regression model
#>   model: DMY ~ a * exp(b * DIM/(DIM + 1))
#>    data: x
#>       a       b 
#>  2.9461 -0.4806 
#>  residual sum-of-squares: 60.6
#> 
#> Number of iterations to convergence: 12 
#> Achieved convergence tolerance: 6.855e-06

Checking the first six rows of the models_weight list:

head(out.ensemble$models_weight$ID2)
#>     Model      AIC      BIC     RMSPE       MAE delta_AIC delta_BIC delta_RMSPE
#> 17    GS1 248.8716 262.6586 0.4066434 0.2980797  64.06626  60.61952  0.03673873
#> 19     LQ 221.7219 235.5088 0.3835325 0.2764029  36.91651  33.46977  0.01362778
#> 16 PapBo6 236.7595 250.5465 0.3961658 0.2892085  51.95413  48.50739  0.02626116
#> 37 qntReg 184.8054 202.0391 0.4409210 0.2986891   0.00000   0.00000  0.07101635
#> 9     DHA 254.6571 268.4441 0.4117455 0.3025809  69.85173  66.40499  0.04184078
#> 8    wood 254.6571 268.4441 0.4117455 0.3025810  69.85173  66.40499  0.04184078
#>     delta_MAE   AIC_weight   BIC_weight RMSPE_weight MAE_weight   Var_weight
#> 17 0.04155304 0.0010871462 0.0015945608   0.02708594 0.02706331 0.0011616558
#> 19 0.01987616 0.0164205111 0.0240846192   0.02714861 0.02712204 0.0009363592
#> 16 0.03268181 0.0036501551 0.0053538282   0.02711434 0.02708733 0.0009997319
#> 37 0.04216239 0.6586434175 0.6844056210   0.02699326 0.02706166 0.0009299715
#> 9  0.04605421 0.0006095767 0.0008940905   0.02707212 0.02705113 0.0011220894
#> 8  0.04605427 0.0006095767 0.0008940905   0.02707212 0.02705113 0.0011220895
#>    CosSquared_weight  BMA_weight RMSPE_rank MAE_rank AIC_rank BIC_rank BMA_rank
#> 17        0.02717808 0.007028742         22       24       21       19       26
#> 19        0.02717127 0.026882457          8        8        8        7       11
#> 16        0.02716655 0.013334228         12       19       13       10       20
#> 37        0.02716609 0.027536973         33       25        1        1       10
#> 9         0.02716575 0.005804098         26       27       27       27       28
#> 8         0.02716575 0.005804085         24       29       25       25       30
#>    Var_rank CosSquared_rank
#> 17        9               1
#> 19       30               2
#> 16       24               3
#> 37       31               4
#> 9        14               5
#> 8        13               6

Checking the first six rows of the production list:

head(out.ensemble$production$ID2)
#>    ID      DMY DIM weight_AIC weight_BIC Weight_RMSPE weight_MAE weight_BMA
#> 1 ID2 0.920000   1   1.831143   1.875771     1.467181   1.465492   1.194834
#> 2 ID2 1.533911   2   2.052090   2.075498     2.006070   2.007553   1.630353
#> 3 ID2 1.720438   3   2.171604   2.185195     2.096008   2.096483   1.865502
#> 4 ID2 1.968925   4   2.247112   2.254831     2.179450   2.179661   2.011144
#> 5 ID2 2.201476   5   2.299417   2.303232     2.236370   2.236451   2.108247
#> 6 ID2 2.339388   6   2.337805   2.338881     2.274461   2.274465   2.175942
#>   weight_Var weight_CosSquared      SMA
#> 1   1.912619          1.474490 1.463731
#> 2   1.837443          2.000676 2.007605
#> 3   1.811571          2.094073 2.095208
#> 4   1.798359          2.178289 2.177863
#> 5   1.790319          2.235544 2.234341
#> 6   1.784906          2.273813 2.272145

Available Weighting Methods

The ensemble predictions are obtained using different weighting methods:

  • weight_AIC: Weighted by AIC
  • weight_BIC: Weighted by BIC
  • weight_RMSPE: Weighted by RMSPE
  • weight_MAE: Weighted by MAE
  • weight_BMA: Bayesian Model Averaging
  • weight_Var: Weighted by residual variance
  • weight_CosSquared: Weighted by cosine similarity
  • SMA: Simple Model Average (equal weights for all models)

Visualizing Model Ranking

Model ranking across individuals can be visualized using RidgeModels and ModelRankRange.

RidgeModels(out.ensemble, metric = "AIC_rank")

ModelRankRange(out.ensemble, metric = "AIC_rank")
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` instead.
#> ℹ The deprecated feature was likely used in the EMOTIONS package.
#>   Please report the issue to the authors.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.

Note that the function ModelRankRange displays the number of individuals which the model was converged in front of each line.

Plotting Actual vs. Predicted Daily Milk Yield

Another visualization option provided by EMOTIONS is the PlotWeightLac function. This function plots the actual daily milk daily production and the predicted values obtained by the ensemble model.The arguments that must be provided to this function are:

  • data: The list generated by the LacCurveFit with the predicted daily milking records
  • ID: The ID of the individual that will have the daily milking records plotted
  • trait: The name of the column containing actual daily milking records
  • metric: The name of the strategy used obtained the predicted values through the ensemble model
  • dim: The name of the column containing days in milk records
  • col: The colors of the actual and predicted values
PlotWeightLac(
  data = out.ensemble, ID = "ID2",
  trait = "DMY", metric = "weight_AIC",
  dim = "DIM", col = c("red", "blue")
)

Customizing Model Selection

The function LacCurveFit allows the customization of the models to be included in the ensemble as well as the parameters of the models.

The EMOTIONS package has a dataset with the list of models, and its respective acronyms, available in the package. This dataset can be accessed using the following command.

data("models_EMOTIONS")
head(models_EMOTIONS)
#>                                   Model Model_acronym
#> 1                      Michaelis-Menten            MM
#> 2               Michaelis-Menten (Rook)           MME
#> 3 Michaelis-Menten + exponential (Rook)           MME
#> 4                          Brody (1923)       brody23
#> 5                          Brody (1924)       brody24
#> 6                            Schumacher           SCH
#>                                             Authors Year
#> 1                     Michaelis, L. and M.L. Menten 1913
#> 2            Rook, A.J., J. France, and M.S. Dhanoa 1993
#> 3            Rook, A.J., J. France, and M.S. Dhanoa 1993
#> 4         Brody, S., A.C. Ragsdale, and C.W. Turner 1923
#> 5          Brody, S., C.W. Tuner, and A.C. Ragsdale 1924
#> 6  Schumacher, F.X., Thornley, J.H.M. and J. France 2007
#>                                                                                                      Reference
#> 1                 https://www.chem.uwec.edu/Chem352_Resources/pages/readings/media/Michaelis_&_Menton_1913.pdf
#> 2                                                                                doi:10.1017/S002185960007684X
#> 3                                                                                doi:10.1017/S002185960007684X
#> 4                                                                                      doi:10.1085/jgp.5.6.777
#> 5                                                        https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2140670/
#> 6 https://books.google.com.au/books/about/Mathematical_Models_in_ Agriculture.html?id=rlwBCRSHobcC&redir_esc=y

Users can select specific models to include in the ensemble. For example, the following models will be used to generate the ensemble: wil, wilk, wilycsml, DiG, DiGpw, legpol3, legpol4, legpolWil, cubsplin3, cubsplin4, cubsplin5, cubsplindef, wilminkPop, and qntReg.

out.ensemble.sub <- LacCurveFit(
  data = LacData,
  ID = "ID",
  trait = "DMY",
  dim = "DIM",
  alpha = 0.1,
  models = c("wil", "wilk", "wilycsml", "DiG", "DiGpw", "legpol3", 
             "legpol4", "legpolWil", "cubsplin3", "cubsplin4", "cubsplin5", 
             "cubsplindef", "wilminkPop", "qntReg"),
  param_list = NULL
)

Evaluating Model Ranks

The ranking of the selected models can be evaluated using a ridge density plot based on AIC scores:

RidgeModels(out.ensemble.sub, metric = "AIC_rank")

Customizing Model Parameters

Users can access the list of models that support parameter customization and their respective parameters using the model_pars dataset:

data(model_pars)
head(model_pars)
#>   Model_acronym                                Parameters
#> 1            MM                         c(a=19.8,b=-1.65)
#> 2           MMR                           c(a=-10,b=-1.4)
#> 3           MME c(a=-0.06608,b=317.49,c=-328.06,d=-0.027)
#> 4       brody23                        c(a=25.6,b=0.0015)
#> 5       brody24             c(a=26.127,b=0.0017,c=0.2575)
#> 6           SCH                               c(a=1,b=16)

For instance, the starting values of the parameters for models MM and wil can be modified as follows:

edited_list <- list(
  MM = c(a = 20, b = -2),
  wil = c(a = 35, b = -5, c = -0.01, k = 0.2)
)

out.ensemble.edited <- LacCurveFit(
  data = LacData,
  ID = "ID",
  trait = "DMY",
  dim = "DIM",
  alpha = 0.1,
  models = "All",
  param_list = edited_list
)

Model convergence can be checked using:

out.ensemble.edited$converged_models$ID2[["MM"]]
#> NULL
out.ensemble.edited$converged_models$ID2[["wil"]]
#> Nonlinear regression model
#>   model: DMY ~ a + b * exp(-k * DIM) + c * DIM
#>    data: x
#>         a         b         c         k 
#>  2.465928 -2.537452 -0.005709  0.482613 
#>  residual sum-of-squares: 37.37
#> 
#> Number of iterations to convergence: 8 
#> Achieved convergence tolerance: 9.786e-06

Calculating Resilience Indicators

The ResInd function calculates resilience estimators, including logarithm of variance, lag-1 autocorrelation, and skewness, based on the weighted predictions. The required parameters are:

  • production_df: A list containing data frames with daily production records (either actual or predicted) from LacCurveFit.
  • dim_filter_range: A vector defining the lower and upper limits for filtering lactation records at the beginning and end of lactation. If no filtering is needed, set the first two values as the minimum DIM and the last two as the maximum DIM.
  • outlier_sd_threshold: A threshold specifying the maximum standard deviations for identifying outlier resilience indicators.
  • weight: The column name containing the selected ensemble prediction (default: weight_AIC).
  • trait: The column name containing daily milk yield records.
  • DIM: The column name containing days in milk records.
  • ID_col: The column name containing unique animal IDs.
out.res <- ResInd(
  out.ensemble$production,
  dim_filter_range = c(1, 7, 203, 210),
  outlier_sd_threshold = 4,
  weight = "weight_AIC",
  trait = "DMY",
  DIM = "DIM",
  ID_col = "ID"
)

The output of ResInd is a list containing:

  1. ri_filtered: A data frame with daily milk production after filtering and the estimated resilience indicators.
  2. dev_list: A list of deviations for each animal and DIM.
  3. removed_samples: A list of animals identified as outliers and removed from analysis.
  4. ri_stats: A data frame summarizing the resilience indicators.

The first six rows of the filtered dataset can be accessed as follows:

head(out.res$ri_filtered)
#> # A tibble: 6 × 5
#>   CodGen log_varianza autocorrelacion_lag1  skewness mean_Prod
#>   <chr>         <dbl>                <dbl>     <dbl>     <dbl>
#> 1 ID102        -1.05                 0.467 -0.000713      3.09
#> 2 ID104        -1.17                 0.493 -0.593         3.76
#> 3 ID105        -1.77                 0.494  0.214         2.18
#> 4 ID121        -0.553                0.460  1.06          3.22
#> 5 ID125        -2.30                 0.452  1.58          1.91
#> 6 ID126        -1.50                 0.498  1.15          2.86

The summary statistics for the resilience indicators can be retrieved using:

out.res$ri_stats
#> # A tibble: 4 × 5
#>   indicator              mean    sd     min     max
#>   <chr>                 <dbl> <dbl>   <dbl>   <dbl>
#> 1 log_varianza         -1.91  0.798 -3.65   -0.0899
#> 2 autocorrelacion_lag1  0.488 0.134  0.0310  0.783 
#> 3 skewness              0.552 0.863 -1.30    2.75  
#> 4 mean_Prod             2.49  0.817  0.849   4.65

Imputation of missing records

The EMOTIONS package also allows the uer to impute missing MY using the ensemble created. For this, the function imp_my must be used. The function receive as input the list containing the data frames with the daily production records obtained from the LacCurveFit function and the a vector with the days in milk where the milk yield will be imputed. It can contain observed and missing DIM. Below we will use a example the records from ID2. The MY from days 32,34,36,37 and 40 will be masked and the whole 1-305 DIM MY will be predicted using the ensemble.


#Saving the original values
original.MY<-LacData[which(LacData$ID=="ID2" & LacData$DIM
                           %in%c(32,34,36,37,40)),
                     "DMY"]

#Masking the original values
LacData.imp<-LacData

LacData.imp[which(LacData.imp$ID=="ID2" & LacData.imp$DIM
                           %in%c(32,34,36,37,40)),
                     "DMY"]<-NA

LacData.imp<-LacData.imp[!is.na(LacData.imp$DMY),]

# Running model fitting and ensemble modeling
out.ensemble.imp <- LacCurveFit(
  data = LacData.imp, ID = "ID", trait = "DMY",
  dim = "DIM", alpha = 0.1,
  models = "All", param_list = NULL, silent=TRUE
)

#Imputing missing masked MY
out.imp<-imp_my(out.ind = out.ensemble.imp, dim = 1:305)

#Checking output
out.imp[which(out.imp$ID=="ID2" & out.imp$DIM
                           %in%c(32,34,36,37,40)),]
#>     ID DIM MY_ensem MY_real
#> 32 ID2  32 2.546363      NA
#> 34 ID2  34 2.533940      NA
#> 36 ID2  36 2.520441      NA
#> 37 ID2  37 2.513322      NA
#> 40 ID2  40 2.490631      NA

#Comparing with the original values
original.MY
#> [1] 2.534975 1.656477 2.370270 2.257203 2.731070

Identification of milk loss events and estimation of resilience indicators

The function milkloss_detect allows the users to identify milk loss events based on different criteria.

  • “pctbase”: recovery when the observed value reaches a given fraction of the baseline (rec), for a given number of consecutive days (stick);

  • “band”: recovery when the observation is inside a tolerance band around the baseline (±tol), for at least stick consecutive days;

  • “resid”: recovery when the residual has improved enough from the nadir (by a fraction rec of the nadir’s absolute residual) for stick consecutive days.

The function computes several descriptors of milk-yield perturbation episodes.

  1. Nadir (day of minimum)

The worst day inside the episode (deepest point of the perturbation).

t_hat = argmin over t in [t_start, t_end] of obs(t)

Nadir = obs(t_hat)

where ‘t_start’ and ‘t_end’ are the episode boundaries.

  1. Amplitude (drop)

Depth of the dip relative to the baseline at the episode’s start.

A = baseline(t_start) - obs(t_hat)

Some variants use ‘baseline(t_hat)’ instead of ‘baseline(t_start)’; here the start of the episode is used as the reference.

  1. ML_per_event (AUD)

Total milk lost (in baseline units) over the episode, i.e., the integrated milk deficit.

ML_per_event = AUD = sum over t from t_start to t_end of [baseline(t) - obs(t)]

In discrete data, AUD is computed with day-weighting: each observation contributes

(baseline(t) - obs(t)) * delta_days

where ‘delta_days’ is the gap to the next observed DIM (last day weight = 1).

  1. Time-to-baseline (TTB)

Time after the nadir until the profile returns to (and stays near) the baseline.

Recovery is declared when ‘obs(t)’ re-enters a tolerance band around the baseline and stays there for ‘r’ consecutive days (controlled by arguments such as ‘tol’ and ‘stick’).

We find the smallest ‘tau >= 0’ such that for all ‘u’ in the interval from ‘t_hat + tau’ to ‘t_hat + tau + r - 1’:

abs( obs(u) - baseline(u) ) <= tol * baseline(u)

Then: ‘TTB = tau’.

If this condition is never satisfied before DIM 305, ‘TTB’ is set to ‘NA’ (right-censored).

  1. Recovery half-life (t_1_2)

Earliest time after nadir when half of the drop has been recovered.

With amplitude ‘A’ as above, define the half-recovery level:

L_half = baseline(t_start) - A / 2

Then:

t_1_2 = minimum tau >= 0 such that obs(t_hat + tau) >= L_half

  1. Slopes (decline and recovery)

Average daily change during the decline into the nadir and during early recovery, summarizing the episode shape.

For a ‘K’-day local window:

DeclineSlope = ( obs( min(t_hat, t_start + K) ) - obs(t_start) ) / ( min(t_hat, t_start + K) - t_start )

RecoverySlope = ( obs( min(t_end, t_hat + K) ) - obs(t_hat) ) / ( min(t_end, t_hat + K) - t_hat )

  1. AUC_deviation

Trapezoidal area under the curve of the milk deficit ‘baseline(t) - obs(t)’ across the whole episode. It summarizes how much milk was lost and for how long (volume × duration).

Conceptually:

AUC_deviation = integral of [baseline(t) - obs(t)] dt from t_start to t_end

In practice this is approximated via the trapezoidal rule on discrete DIMs.

  1. prod_decline_slope_amp

Product of the decline slope (anchor → nadir) and the amplitude (anchor − nadir). It combines speed and depth of the decline into a single indicator of how “aggressive” the drop is.

prod_decline_slope_amp = DeclineSlope * A

  1. prod_recovery_slope_TTB

Product of the recovery slope (nadir → recovery) and Time-to-baseline (TTB). It combines how fast the animal recovers with how long recovery takes, summarizing recovery efficiency.

prod_recovery_slope_TTB = RecoverySlope * TTB

The function returns a list with two data frames containing:

episodes: individual milk loss events and their resilience indicators;

aggregates: milk loss events aggregated per individual.

#Creating a input file based on data frame with all ensemble predictions
out.imp.ensemb <- do.call(rbind, out.ensemble$production)

#Estimating the milk loss events
res.ensem <- milkloss_detect(
  data = out.imp.ensemb,
  id_col = "ID",
  dim_col = "DIM",
  MY_col = "DMY",
  MY_pred = "weight_AIC",
  dim_start = 11, dim_end = 294,
  drop_pct = 0.10, min_len = 1,
  rec_mode = "pctbase", rec = 0.99, stick = 2
)

#Checking the output
str(res.ensem)
#> List of 2
#>  $ episodes  :'data.frame':  1545 obs. of  19 variables:
#>   ..$ ID                         : chr [1:1545] "ID102" "ID102" "ID102" "ID102" ...
#>   ..$ episode_index              : int [1:1545] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ start_DIM                  : int [1:1545] 11 15 40 43 70 78 82 86 87 88 ...
#>   ..$ end_DIM                    : int [1:1545] 14 33 42 44 76 79 85 86 87 90 ...
#>   ..$ nadir_DIM                  : int [1:1545] 11 21 40 43 70 78 83 86 87 88 ...
#>   ..$ recovery_DIM               : int [1:1545] 14 33 42 44 76 79 85 86 87 90 ...
#>   ..$ peak_drop_pct              : num [1:1545] 0.134 0.244 0.743 0.565 0.3 ...
#>   ..$ anchor_ref                 : num [1:1545] 2.91 2.85 2.85 2.92 3.88 ...
#>   ..$ amplitude_anchor           : num [1:1545] 0.392 0.697 2.114 1.653 1.167 ...
#>   ..$ ML_per_event               : num [1:1545] 0.596 5.624 3.277 1.653 3.224 ...
#>   ..$ AUC_deviation              : num [1:1545] 0.711 5.475 2.24 0.841 4.212 ...
#>   ..$ TTB_days                   : num [1:1545] 3 12 2 1 6 1 2 0 0 2 ...
#>   ..$ t_half_anchor_days         : num [1:1545] 3 8 2 1 1 0 1 0 0 1 ...
#>   ..$ decline_slope              : num [1:1545] NA -0.11 -2.53 -1.68 -1.2 ...
#>   ..$ recovery_slope             : num [1:1545] 0.2887 0.0195 1.1103 1.7435 0.2722 ...
#>   ..$ prod_decline_slope_amp     : num [1:1545] NA -0.0764 -5.3463 -2.7822 -1.4013 ...
#>   ..$ prod_recovery_slope_TTB    : num [1:1545] 0.866 0.234 2.221 1.744 1.633 ...
#>   ..$ days_to_nadir              : num [1:1545] 0 6 0 0 0 0 1 0 0 0 ...
#>   ..$ days_from_nadir_to_recovery: num [1:1545] 3 12 2 1 6 1 2 0 0 2 ...
#>  $ aggregates:'data.frame':  100 obs. of  10 variables:
#>   ..$ ID                          : chr [1:100] "ID102" "ID104" "ID105" "ID121" ...
#>   ..$ n_episodes                  : int [1:100] 26 17 19 19 21 19 22 15 15 19 ...
#>   ..$ total_ML                    : num [1:100] 36.9 44.8 23.9 59.1 14.7 ...
#>   ..$ total_auc_dev               : num [1:100] 35.1 42.2 22.8 59.2 16.4 ...
#>   ..$ total_days                  : int [1:100] 77 74 52 95 84 75 105 74 66 103 ...
#>   ..$ mean_amp                    : num [1:100] 0.778 1.057 0.568 1.116 0.409 ...
#>   ..$ mean_thalf                  : num [1:100] 1.36 1 1 1.89 1.14 ...
#>   ..$ median_ttb                  : num [1:100] 2 3 1.5 2 2 2 3 3 2 3 ...
#>   ..$ mean_prod_decline_slope_amp : num [1:100] -0.992 -0.855 -0.352 -0.813 -0.174 ...
#>   ..$ mean_prod_recovery_slope_TTB: num [1:100] 0.927 1.053 0.767 1.292 0.561 ...

The EMOTIONS package also contains a function, named PlotMilkLoss, that allows to plot the milk loss events identified by the milkloss_detect function. The arguments of PlotMilkLoss are:

data: A data frame containing the observed and predicted daily milking records

ID: The ID of the individual that will have the daily milking records plotted

res.milkloss: The object with the output of milkloss_detect function

MY_col: The name of the column containing the observed milk yield

MY_pred: The name of the column containing the predicted milk yield

col: The colors of the actual, predicted values, and milk loss events. In this order

id_col: The name of the column containing the individual IDs

PlotMilkLoss(data=out.imp.ensemb, 
                       ID="ID2", 
                       res.milkloss=res.ensem,
                       MY_col="DMY", 
                       MY_pred="weight_AIC",
                       col = c("red", "blue", "darkgreen"),
                       id_col = "ID")