Publicação

Evolving Decision Rules with Geometric Semantic Genetic Programming

Detalhes bibliográficos
Resumo:	Due to the ever increasing amount of data available in today’s world, a variety of methods to harness this information are continuously being created, refined and utilized, drawing inspiration from a multitude of sources. Relevant to this work are Supervised Learning techniques, that attempt to discover the relationship between the characteristics of data and a certain feature, to uncover the function that maps input to output. Among these, Genetic Programming (GP) attempts to replicate the concept of evolution as defined by Charles Darwin, mimicking natural selection and genetic operators to generate and improve a population of solutions for a given prediction problem. Among the possible variants of GP, Geometric Semantic Genetic Programming (GSGP) stands out, due to its focus on the meaning of each individual it creates, rather than their structure. It achieves by imagining an hypothetical and perfect model, and evaluating the performance of others by measuring how much their behaviour differ from it, and uses a set of genetic operators that have a specific effect on the individual’s semantics (i.e., its predictions for training data), with the goal of reaching ever closer to the so called perfect specimen. This thesis conceptualizes and evaluates the performance of aGSGPimplementation made specifically to deal with multi-class classification problems, using tree-based individuals that are composed by a set of rules to allow the categorization of data. This is achieved through the careful translation of GSGP’s theoretical foundation, first into algorithms and then into an actual code library, able to tackle problems of this domain. The results demonstrate that the implementation works successfully and respects the properties of the the original technique, allowing us to obtain excellent results on training data, although performance on unseen data is a slightly worse than that of other state-of-the-art algorithms.
Autores principais:	Rasteiro, Diogo Miguel Galveia De Oliveira
Assunto:	Genetic Programming Geometric Semantic Genetic Programming Machine Learning Decision Trees Programação Genética Programação Genética em Geometria Semântica Aprendizagem Automática Árvores de Decisão
Ano:	2023
País:	Portugal
Tipo de documento:	dissertação de mestrado
Tipo de acesso:	acesso aberto
Instituição associada:	Universidade Nova de Lisboa
Idioma:	inglês
Origem:	Repositório Institucional da UNL

Descrição
Resumo:	Due to the ever increasing amount of data available in today’s world, a variety of methods to harness this information are continuously being created, refined and utilized, drawing inspiration from a multitude of sources. Relevant to this work are Supervised Learning techniques, that attempt to discover the relationship between the characteristics of data and a certain feature, to uncover the function that maps input to output. Among these, Genetic Programming (GP) attempts to replicate the concept of evolution as defined by Charles Darwin, mimicking natural selection and genetic operators to generate and improve a population of solutions for a given prediction problem. Among the possible variants of GP, Geometric Semantic Genetic Programming (GSGP) stands out, due to its focus on the meaning of each individual it creates, rather than their structure. It achieves by imagining an hypothetical and perfect model, and evaluating the performance of others by measuring how much their behaviour differ from it, and uses a set of genetic operators that have a specific effect on the individual’s semantics (i.e., its predictions for training data), with the goal of reaching ever closer to the so called perfect specimen. This thesis conceptualizes and evaluates the performance of aGSGPimplementation made specifically to deal with multi-class classification problems, using tree-based individuals that are composed by a set of rules to allow the categorization of data. This is achieved through the careful translation of GSGP’s theoretical foundation, first into algorithms and then into an actual code library, able to tackle problems of this domain. The results demonstrate that the implementation works successfully and respects the properties of the the original technique, allowing us to obtain excellent results on training data, although performance on unseen data is a slightly worse than that of other state-of-the-art algorithms.