Publicação

Supervised Machine Learning Algorithms in Predicting Damaged Cargo: A Portuguese Logistics & Transportation Company Case Study

Ver documento

Detalhes bibliográficos
Resumo:This dissertation aims to predict the likelihood of units and products getting damaged during their shipment process, using supervised machine learning classifiers. Conducted as a case study within a Portuguese transportation company, the research employs real-world data to address several gaps in the prediction of damaged cargo literature. The primary gap addressed involves incorporating sampling techniques to overcome imbalanced datasets issues, testing different classifiers with recent data from a verified business source, and investigating cargo incidents from freight road transportation within the Portuguese context. To achieve these objectives, the research applies Synthetic Minority Over-sampling Technique (SMOTE), Adaptive Synthetic (ADASYN), Random Over Sampling (ROS), and Random Under Sampling (RUS) techniques, evaluating their performance against the highly imbalanced original data. Furthermore, the predictive performance of machine learning classifiers, including Random Forest, Logistic Regression, Gradient Boosting, and K-Nearest Neighbors, is assessed, and compared. The findings highlight the superiority of the Random Forest model over other classifiers, with a combination of ROS and RUS proving to be the most effective resampling technique. Notably, when testing the model's performance with imbalanced data, the recall score surpassed 0.7, aligning with the real-world context objective of minimizing misclassification costs. Additionally, the research identifies features with significant influence on the likelihood of cargo suffering damage, providing valuable insights for optimizing logistics operations. Overall, this dissertation presents a practical application of handling imbalanced datasets to deepen understanding of business challenges, contributing to advancements in the prediction of damaged cargo literature.
Autores principais:Vale, Alice Lourenço
Assunto:Supervised Machine Learning Imbalanced Binary Classification Predictive Analytics
Ano:2024
País:Portugal
Tipo de documento:dissertação de mestrado
Tipo de acesso:acesso aberto
Instituição associada:Universidade Nova de Lisboa
Idioma:inglês
Origem:Repositório Institucional da UNL
Descrição
Resumo:This dissertation aims to predict the likelihood of units and products getting damaged during their shipment process, using supervised machine learning classifiers. Conducted as a case study within a Portuguese transportation company, the research employs real-world data to address several gaps in the prediction of damaged cargo literature. The primary gap addressed involves incorporating sampling techniques to overcome imbalanced datasets issues, testing different classifiers with recent data from a verified business source, and investigating cargo incidents from freight road transportation within the Portuguese context. To achieve these objectives, the research applies Synthetic Minority Over-sampling Technique (SMOTE), Adaptive Synthetic (ADASYN), Random Over Sampling (ROS), and Random Under Sampling (RUS) techniques, evaluating their performance against the highly imbalanced original data. Furthermore, the predictive performance of machine learning classifiers, including Random Forest, Logistic Regression, Gradient Boosting, and K-Nearest Neighbors, is assessed, and compared. The findings highlight the superiority of the Random Forest model over other classifiers, with a combination of ROS and RUS proving to be the most effective resampling technique. Notably, when testing the model's performance with imbalanced data, the recall score surpassed 0.7, aligning with the real-world context objective of minimizing misclassification costs. Additionally, the research identifies features with significant influence on the likelihood of cargo suffering damage, providing valuable insights for optimizing logistics operations. Overall, this dissertation presents a practical application of handling imbalanced datasets to deepen understanding of business challenges, contributing to advancements in the prediction of damaged cargo literature.