Evaluación de Políticas de Aprendizaje por refuerzo Aplicado a un Agente Bípedo

Seguí Cordero, Aleix

Evaluación de Políticas de Aprendizaje por refuerzo Aplicado a un Agente Bípedo

Seguí Cordero, Aleix

URI: http://hdl.handle.net/11201/158095

Date: 2021

Submission date: 2021-10-13

Abstract:

[spa] Desde hace muchos años los videojuegos han adoptado las tecnologías que aporta la inteligencia artificial para mejorar sus prestaciones, pudiendo así crear objetos con una inteligencia independiente al jugador. Dentro de las tecnologías de la inteligencia artificial más utilizadas en el campo de los videojuegos, se encuentra el reinforcement learning o los algoritmos genéticos. Tenemos un claro ejemplo, uno de los que motivo la realización de este trabajo, con el DeepMind de Google, en el cual cuerpos con distintas anatomías aprenden a desplazarse por terrenos irregulares. En el presente trabajo se ha construido un cuerpo bípedo mediante librerías de videojuegos, como son Pymunk y Pygame, el cual intentará aprender a caminar mediante el uso de algoritmos de aprendizaje por refuerzo y algoritmos genéticos. Los agentes están restringidos a usar una red neuronal básica que contiene 12 neuronas. El algoritmo modifica los parámetros de la red neuronal básica, mediante la aplicación de ruido y selección de los mejores agentes. La selección de los agentes se hará mediante políticas de aprendizaje, estas políticas valoran características tales como la velocidad, la distancia recorrida o si está caminando de forma erguida. Se comparan las distintas políticas mediante un análisis estadístico de los resultados obtenidos. También, se analiza la convergencia de los algoritmos para asegurar que el número de generaciones no impiden llegar al punto máximo del aprendizaje. En los resultados obtenidos ningún agente emula la biomecánica del andar humano. Por otra parte, se concluye que las políticas donde se combinan diferentes recompensas obtienen mejores resultados.

[eng] For many years, in the field of videogames have adopted technologies provided by artificial inteligence for improve their qualities. From this technologies, the videogames started to create objects with an independent inteligence of the player. The reinforcement learning and genetics algorithms are one of the most used IA technologies in videogames. We have a clear example on Google’s Deep Mind, that was an inspiration for doing this proyect. The main goal of Google’s Deep mind is to achieve that some bipedal structures learns to walk and overcome a diferent type of obstacles with an algorithm that combines diferents technologies of reinforcement learning and genetics algorithms. On this proyect we will build a bipedal structure using videogames libraries, such as Pygame and Pymunk, wich will attempt to learn to walk correctly. We will use reinforcement learning and genetics algorithms with a simple neural network, that only has 12 neurons. Our algorithm modifies the neuron’s parameters and will choose the best individuals of every generation. The algorithm will choose the individuals of the next generation evaluating their performance during a simulation. The evaluation is determined by a learning policies that reward the distance, the velocity snd the medium heigh during during the simulation. At the end of this proyect we will make an analiticla study for determine the best learning political. Also, the convergence of the algorithm is analyzed to ensure that the numbers of generations reach the maximum point of learning. In the results obtained, the agents don’t emulates the human’s biomechanics. On the other hand, we will conclued that policies where different rewards are combined obtain better results

Show full item record