Publicação

Deep reinforcement learning for robot navigation systems

Detalhes bibliográficos
Resumo:	Reinforcement Learning in robotics has been a challenging topic for the past few years. The ability to equip a robot with a tool powerful enough to allow an autonomous discovery of optimal behaviour through trial-and-error interactions with the environment, has been a motive for numerous in-depth research projects. This dissertation presents a thorough theoretical foundation that supports different reinforcement learning algorithms. Three different algorithms namely Q-Learning, Monte Carlo Policy Gradient and Deep Deterministic Policy Gradient were selected and implemented on OpenAI Gym control environments. The selected environments were MountainCar, CartPole and Pendulum. These granted a wide variety of applicable algorithms for different action-space and state-space. For each implemented algorithm, a detailed hyperparameter configuration is analysed and compared. A simulated agent was also created in V-REP and configured via ROS and a Python control node. The agent is a Bot’n Roll ONE A robot, which is a differential robot with embedded distance sensors. The goal of the robot/agent is to surpass three levels of increasing complexity mazes using its distance sensors. Tests with different sensor topologies using the embedded distance sensors and additional Time-of-Flight sensors were carried out. Q-Learning and Monte Carlo Policy Gradient algorithms were implemented in the simulated robot. Q-Learning allowed a comparison between two different methods regarding different action selection timings. One of the methods was able to solve the three mazes using the embedded discrete distance sensors. With the Monte Carlo Policy Gradient algorithm, a thorough analysis of how reward functions influence the robot learned policies is presented. The Deep Deterministic Policy Gradient, even though not implemented on the simulated robot, demonstrated a significant potential with several essential advantages such as the stochastic behaviour policy associated with a deterministic target policy, the Actor-Critic method and continuous control.
Autores principais:	Ribeiro, Tiago Alcântara
Assunto:	Machine Learning Reinforcement learning Deep learning Robotics Navigations systems Aprendizagem máquina Aprendizagem por reforço Aprendizagem profunda Robótica Sistemas de navegação
Ano:	2019
País:	Portugal
Tipo de documento:	dissertação de mestrado
Tipo de acesso:	acesso aberto
Instituição associada:	Universidade do Minho
Idioma:	inglês
Origem:	RepositóriUM - Universidade do Minho

Descrição
Resumo:	Reinforcement Learning in robotics has been a challenging topic for the past few years. The ability to equip a robot with a tool powerful enough to allow an autonomous discovery of optimal behaviour through trial-and-error interactions with the environment, has been a motive for numerous in-depth research projects. This dissertation presents a thorough theoretical foundation that supports different reinforcement learning algorithms. Three different algorithms namely Q-Learning, Monte Carlo Policy Gradient and Deep Deterministic Policy Gradient were selected and implemented on OpenAI Gym control environments. The selected environments were MountainCar, CartPole and Pendulum. These granted a wide variety of applicable algorithms for different action-space and state-space. For each implemented algorithm, a detailed hyperparameter configuration is analysed and compared. A simulated agent was also created in V-REP and configured via ROS and a Python control node. The agent is a Bot’n Roll ONE A robot, which is a differential robot with embedded distance sensors. The goal of the robot/agent is to surpass three levels of increasing complexity mazes using its distance sensors. Tests with different sensor topologies using the embedded distance sensors and additional Time-of-Flight sensors were carried out. Q-Learning and Monte Carlo Policy Gradient algorithms were implemented in the simulated robot. Q-Learning allowed a comparison between two different methods regarding different action selection timings. One of the methods was able to solve the three mazes using the embedded discrete distance sensors. With the Monte Carlo Policy Gradient algorithm, a thorough analysis of how reward functions influence the robot learned policies is presented. The Deep Deterministic Policy Gradient, even though not implemented on the simulated robot, demonstrated a significant potential with several essential advantages such as the stochastic behaviour policy associated with a deterministic target policy, the Actor-Critic method and continuous control.