Publicação

Tradeoff between moving targets, gradient magnitude and performance in quantum variational Q-Learning

Ver documento

Detalhes bibliográficos
Resumo:Reinforcement Learning (RL) consists of designing agents that make intelligent decisions without human supervision. When used alongside function approximators such as Neural Networks (NNs), RL is capable of solving extremely complex problems. Deep Q-Learning, a RL algorithm that uses Deep NNs, even achieved super-human performance in some specific tasks. Nonetheless, it is also possible to use Variational Quantum Circuits (VQCs) as function approximators in RL algorithms. This work empirically studies the performance and trainability of such VQC-based Deep Q-Learning models in OpenAI’s gym CartPole-v0 and Acrobot-v1 environments. More specifically, we research how data re-uploading affects both these metrics. We show that the magnitude and the variance of the gradients of these models remain substantial throughout training due to the moving targets of Deep Q-Learning. Moreover, we show that increasing the number of qubits does not lead to a decrease in the magnitude and variance of the gradients, unlike what was expected due to the Barren Plateau Phenomenon. This hints at the possibility of VQCs being specially adequate for being used as function approximators in such a context. We also use the Universal Quantum Classifier as a function approximator in VQC-based Deep Q-Learning and implement VQC-based models capable of achieving considerable performance in the Acrobot-v1 environment, a previously untapped environment for VQCs.
Autores principais:Coelho, Rodrigo da Silva Gomes Peres
Assunto:Reinforcement learning Quantum computing Variational quantum circuits Neural networks Computação quântica Circuitos variacionais quânticos Redes neuronais
Ano:2023
País:Portugal
Tipo de documento:dissertação de mestrado
Tipo de acesso:acesso aberto
Instituição associada:Universidade do Minho
Idioma:inglês
Origem:RepositóriUM - Universidade do Minho
Descrição
Resumo:Reinforcement Learning (RL) consists of designing agents that make intelligent decisions without human supervision. When used alongside function approximators such as Neural Networks (NNs), RL is capable of solving extremely complex problems. Deep Q-Learning, a RL algorithm that uses Deep NNs, even achieved super-human performance in some specific tasks. Nonetheless, it is also possible to use Variational Quantum Circuits (VQCs) as function approximators in RL algorithms. This work empirically studies the performance and trainability of such VQC-based Deep Q-Learning models in OpenAI’s gym CartPole-v0 and Acrobot-v1 environments. More specifically, we research how data re-uploading affects both these metrics. We show that the magnitude and the variance of the gradients of these models remain substantial throughout training due to the moving targets of Deep Q-Learning. Moreover, we show that increasing the number of qubits does not lead to a decrease in the magnitude and variance of the gradients, unlike what was expected due to the Barren Plateau Phenomenon. This hints at the possibility of VQCs being specially adequate for being used as function approximators in such a context. We also use the Universal Quantum Classifier as a function approximator in VQC-based Deep Q-Learning and implement VQC-based models capable of achieving considerable performance in the Acrobot-v1 environment, a previously untapped environment for VQCs.