Publicação
Supervised Machine Learning Algorithms in Predicting Damaged Cargo: A Portuguese Logistics & Transportation Company Case Study
| Resumo: | This dissertation aims to predict the likelihood of units and products getting damaged during their shipment process, using supervised machine learning classifiers. Conducted as a case study within a Portuguese transportation company, the research employs real-world data to address several gaps in the prediction of damaged cargo literature. The primary gap addressed involves incorporating sampling techniques to overcome imbalanced datasets issues, testing different classifiers with recent data from a verified business source, and investigating cargo incidents from freight road transportation within the Portuguese context. To achieve these objectives, the research applies Synthetic Minority Over-sampling Technique (SMOTE), Adaptive Synthetic (ADASYN), Random Over Sampling (ROS), and Random Under Sampling (RUS) techniques, evaluating their performance against the highly imbalanced original data. Furthermore, the predictive performance of machine learning classifiers, including Random Forest, Logistic Regression, Gradient Boosting, and K-Nearest Neighbors, is assessed, and compared. The findings highlight the superiority of the Random Forest model over other classifiers, with a combination of ROS and RUS proving to be the most effective resampling technique. Notably, when testing the model's performance with imbalanced data, the recall score surpassed 0.7, aligning with the real-world context objective of minimizing misclassification costs. Additionally, the research identifies features with significant influence on the likelihood of cargo suffering damage, providing valuable insights for optimizing logistics operations. Overall, this dissertation presents a practical application of handling imbalanced datasets to deepen understanding of business challenges, contributing to advancements in the prediction of damaged cargo literature. |
|---|---|
| Autores principais: | Vale, Alice Lourenço |
| Assunto: | Supervised Machine Learning Imbalanced Binary Classification Predictive Analytics |
| Ano: | 2024 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade Nova de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório Institucional da UNL |
| Resumo: | This dissertation aims to predict the likelihood of units and products getting damaged during their shipment process, using supervised machine learning classifiers. Conducted as a case study within a Portuguese transportation company, the research employs real-world data to address several gaps in the prediction of damaged cargo literature. The primary gap addressed involves incorporating sampling techniques to overcome imbalanced datasets issues, testing different classifiers with recent data from a verified business source, and investigating cargo incidents from freight road transportation within the Portuguese context. To achieve these objectives, the research applies Synthetic Minority Over-sampling Technique (SMOTE), Adaptive Synthetic (ADASYN), Random Over Sampling (ROS), and Random Under Sampling (RUS) techniques, evaluating their performance against the highly imbalanced original data. Furthermore, the predictive performance of machine learning classifiers, including Random Forest, Logistic Regression, Gradient Boosting, and K-Nearest Neighbors, is assessed, and compared. The findings highlight the superiority of the Random Forest model over other classifiers, with a combination of ROS and RUS proving to be the most effective resampling technique. Notably, when testing the model's performance with imbalanced data, the recall score surpassed 0.7, aligning with the real-world context objective of minimizing misclassification costs. Additionally, the research identifies features with significant influence on the likelihood of cargo suffering damage, providing valuable insights for optimizing logistics operations. Overall, this dissertation presents a practical application of handling imbalanced datasets to deepen understanding of business challenges, contributing to advancements in the prediction of damaged cargo literature. |
|---|