Publicação

Path integral learning of multidimensional movement trajectories

Ver documento

Detalhes bibliográficos
Resumo:This paper explores the use of Path Integral Methods, particularly several variants of the recent Path Integral Policy Improvement (PI 2 ) algorithm in multidimensional movement parametrized policy learning. We rely on Dynamic Movement Primitives (DMPs) to codify discrete and rhythmic trajectories, and apply the PI 2 -CMA and PI BB methods in the learning of optimal policy parameters, according to different cost functions that inherently encode movement objectives. Additionally we merge both of these variants and propose the PI BB -CMA algorithm, comparing all of them with the vanilla version of PI 2 . From the obtained results we conclude that PI BB -CMA surpasses all other methods in terms of convergence speed and iterative final cost, which leads to an increased interest in its application to more complex robotic problems.
Autores principais:André, João
Outros Autores:Santos, Cristina; Costa, Lino
Assunto:Path Integral Dynamic Movement Primitives Parametrized Policies Reinforcement Learning Robotics Black Box Optimization Ciências Naturais::Matemáticas
Ano:2013
País:Portugal
Tipo de documento:comunicação em conferência
Tipo de acesso:acesso aberto
Instituição associada:Universidade do Minho
Idioma:inglês
Origem:RepositóriUM - Universidade do Minho
Descrição
Resumo:This paper explores the use of Path Integral Methods, particularly several variants of the recent Path Integral Policy Improvement (PI 2 ) algorithm in multidimensional movement parametrized policy learning. We rely on Dynamic Movement Primitives (DMPs) to codify discrete and rhythmic trajectories, and apply the PI 2 -CMA and PI BB methods in the learning of optimal policy parameters, according to different cost functions that inherently encode movement objectives. Additionally we merge both of these variants and propose the PI BB -CMA algorithm, comparing all of them with the vanilla version of PI 2 . From the obtained results we conclude that PI BB -CMA surpasses all other methods in terms of convergence speed and iterative final cost, which leads to an increased interest in its application to more complex robotic problems.