Publicação

Estimating discrete object orientation based on 2D images using deep learning techniques

Ver documento

Detalhes bibliográficos
Resumo:This thesis investigates approaches for determining the 3D orientation of vehicles from 2D images, a key challenge in computer vision with applications across robotics, autonomous driving, and maintenance support. Two main methodologies were explored: a Siamese neural network and a Deep Convolutional Neural Network (DCNN) approach, each tested across varied dataset configurations. The Siamese network was implemented with VGG and ResNet architectures, achieving a peak accuracy of 95.8% using VGG16 on RGB images without background. However, the ResNet configurations in this approach showed lower performance, potentially due to dataset limitations and overfitting. The second approach employed DCNN models with both ResNet and EfficientNet architectures, systematically evaluating combinations of original and augmented dataset variations. ResNet152 achieved the highest accuracy of 96.39% on augmented RGB images without background, demonstrating superior robustness and adaptability to data variations. EfficientNet B2 also performed well, but overall, the ResNet models exhibited more consistent results across scenarios. The results underscore the effectiveness of DCNN architectures, particularly ResNet, for orientation inference tasks, indicating their resilience and accuracy across diverse data conditions. Future work will explore sensor fusion techniques to integrate additional data sources, such as LiDAR or radar, with RGB images to further enhance vehicle orientation detection accuracy. This research contributes to advancing 3D object orientation detection methods and highlights promising avenues for continued innovation in computer vision applications.
Autores principais:Yahia, Youssef Bel Haj
Assunto:Computer vision Siamese networks DCNN
Ano:2024
País:Portugal
Tipo de documento:dissertação de mestrado
Tipo de acesso:acesso aberto
Instituição associada:Instituto Politécnico de Bragança
Idioma:inglês
Origem:Biblioteca Digital do IPB
Descrição
Resumo:This thesis investigates approaches for determining the 3D orientation of vehicles from 2D images, a key challenge in computer vision with applications across robotics, autonomous driving, and maintenance support. Two main methodologies were explored: a Siamese neural network and a Deep Convolutional Neural Network (DCNN) approach, each tested across varied dataset configurations. The Siamese network was implemented with VGG and ResNet architectures, achieving a peak accuracy of 95.8% using VGG16 on RGB images without background. However, the ResNet configurations in this approach showed lower performance, potentially due to dataset limitations and overfitting. The second approach employed DCNN models with both ResNet and EfficientNet architectures, systematically evaluating combinations of original and augmented dataset variations. ResNet152 achieved the highest accuracy of 96.39% on augmented RGB images without background, demonstrating superior robustness and adaptability to data variations. EfficientNet B2 also performed well, but overall, the ResNet models exhibited more consistent results across scenarios. The results underscore the effectiveness of DCNN architectures, particularly ResNet, for orientation inference tasks, indicating their resilience and accuracy across diverse data conditions. Future work will explore sensor fusion techniques to integrate additional data sources, such as LiDAR or radar, with RGB images to further enhance vehicle orientation detection accuracy. This research contributes to advancing 3D object orientation detection methods and highlights promising avenues for continued innovation in computer vision applications.