Publicação

Artificial intelligence-based control of autonomous vehicles in simulation: a CNN vs. RL case study

Detalhes bibliográficos
Resumo:	The article provides a comparison of Convolutional Neural Network (CNN) and Reinforcement Learning (RL) applied to the field of autonomous driving within the CARLA (CAr Learning to Act) simulator for training and evaluation. The analysis of results revealed CNNs better overall performance, as it demonstrated a more refined driving experience, shorter training durations, and a more straightforward learning curve and optimization process. However, it required data labelling. In contrast, RL relayed on an exhaustive (unsupervised) exploration of different models, ultimately selecting the model at timestep 600,000, which had the highest mean reward. Nevertheless, RL’s approach revealed its susceptibility to excessive oscillations and inconsistencies, necessitating additional optimization and tuning of hyperparameters and reward functions. This conclusion is further substantiated by a range of used performance metrics (objective and subjective), designed to assess the performance of each approach.
Autores principais:	Vasiljević, Ive
Outros Autores:	Musić, Josip; Lima, José
Assunto:	Reinforcement learning CNN CARLA simulator
Ano:	2024
País:	Portugal
Tipo de documento:	comunicação em conferência
Tipo de acesso:	acesso aberto
Instituição associada:	Instituto Politécnico de Bragança
Idioma:	inglês
Origem:	Biblioteca Digital do IPB

Descrição
Resumo:	The article provides a comparison of Convolutional Neural Network (CNN) and Reinforcement Learning (RL) applied to the field of autonomous driving within the CARLA (CAr Learning to Act) simulator for training and evaluation. The analysis of results revealed CNNs better overall performance, as it demonstrated a more refined driving experience, shorter training durations, and a more straightforward learning curve and optimization process. However, it required data labelling. In contrast, RL relayed on an exhaustive (unsupervised) exploration of different models, ultimately selecting the model at timestep 600,000, which had the highest mean reward. Nevertheless, RL’s approach revealed its susceptibility to excessive oscillations and inconsistencies, necessitating additional optimization and tuning of hyperparameters and reward functions. This conclusion is further substantiated by a range of used performance metrics (objective and subjective), designed to assess the performance of each approach.