Publicação

Machine learning approaches for tomato crop yield prediction in precision agriculture

Ver documento

Detalhes bibliográficos
Resumo:The objective of this project was to apply ML techniques to predict processing tomato crop yield given information on soil properties, weather conditions, and applied fertilizers. Besides being robust enough for predicting tomato productivity, the model needed to be interpretable and transparent for the business. The models assessed were Decision Trees Regression, ensemble bagging models like Random Forest Regression, and boosting techniques like Gradient Boosting Regression, and Support Vector Regression. Overall, Gradient Boosting and Support Vector models presented the best performance. For improving the predictive power, we combined the predictions of our two best models into a stacked approach with a Ridge Regression as the final model. The generalization error of the final chosen model on new data was 9.02 ton/ha for the MAE metric, 9.5% for the MAPE, and 13.5 ton/ha for the RMSE. This means that our model can predict tomato crop yield with an approximate error of 9 ton/ha. Even though our final model was complex and not intrinsically interpretable, we were able to apply model-agnostic interpretation methods like the SHAP summary plot to better understand the feature importance and feature effects, and the Accumulated Local Effects (ALE) plot, to explain how features influence the outcome of the model on average. In general, the objectives of the project were accomplished and the company was satisfied with the result of the model and its interpretation.
Autores principais:Suescún, María Fernanda Restrepo
Assunto:Yield prediction Tomato Agriculture Machine learning Ensemble learning
Ano:2021
País:Portugal
Tipo de documento:dissertação de mestrado
Tipo de acesso:acesso aberto
Instituição associada:Universidade Nova de Lisboa
Idioma:inglês
Origem:Repositório Institucional da UNL
Descrição
Resumo:The objective of this project was to apply ML techniques to predict processing tomato crop yield given information on soil properties, weather conditions, and applied fertilizers. Besides being robust enough for predicting tomato productivity, the model needed to be interpretable and transparent for the business. The models assessed were Decision Trees Regression, ensemble bagging models like Random Forest Regression, and boosting techniques like Gradient Boosting Regression, and Support Vector Regression. Overall, Gradient Boosting and Support Vector models presented the best performance. For improving the predictive power, we combined the predictions of our two best models into a stacked approach with a Ridge Regression as the final model. The generalization error of the final chosen model on new data was 9.02 ton/ha for the MAE metric, 9.5% for the MAPE, and 13.5 ton/ha for the RMSE. This means that our model can predict tomato crop yield with an approximate error of 9 ton/ha. Even though our final model was complex and not intrinsically interpretable, we were able to apply model-agnostic interpretation methods like the SHAP summary plot to better understand the feature importance and feature effects, and the Accumulated Local Effects (ALE) plot, to explain how features influence the outcome of the model on average. In general, the objectives of the project were accomplished and the company was satisfied with the result of the model and its interpretation.