Publicação

Selecção de variáveis em estatística multivariada

Ver documento

Detalhes bibliográficos
Resumo:The problem of variable selection consists in identifying a k-subset of a set of original variables that is optimal for a given criterion of adequate approximation to the whole data set. In this work we present and we discuss some new optimization criteria and others that are suggested by the literature. We present and we discuss the algorithms for the optimization problems resulting from the different criteria, as well as the calculated computational results. The criteria and algorithms are available in the package Subselect that is called from statistical program R. This package is in permanent update, with varied contributions, between which, this work is included. Package and program are of the public domain and meet available in the Internet. In this work we also discuss a multiple criteria optimization for the problem of identifying subsets of variables. In this approach, we are looking for subsets that are optimal for some criteria simultaneously. The induced total order for an only criterion gives place to a partial order, with which is associated a set of solutions that cannot simultaneously be improved in all the criteria. Usually they are called maximal, efficient solutions or Pareto optimal.
Autores principais:Minhoto, Manuel Joaquim Piteira
Assunto:variable selection multivariate statistics combinatorial optimization Heuristics Pareto optimal
Ano:2009
País:Portugal
Tipo de documento:tese de doutoramento
Tipo de acesso:acesso aberto
Instituição associada:Universidade de Lisboa
Idioma:português
Origem:Repositório da Universidade de Lisboa
Descrição
Resumo:The problem of variable selection consists in identifying a k-subset of a set of original variables that is optimal for a given criterion of adequate approximation to the whole data set. In this work we present and we discuss some new optimization criteria and others that are suggested by the literature. We present and we discuss the algorithms for the optimization problems resulting from the different criteria, as well as the calculated computational results. The criteria and algorithms are available in the package Subselect that is called from statistical program R. This package is in permanent update, with varied contributions, between which, this work is included. Package and program are of the public domain and meet available in the Internet. In this work we also discuss a multiple criteria optimization for the problem of identifying subsets of variables. In this approach, we are looking for subsets that are optimal for some criteria simultaneously. The induced total order for an only criterion gives place to a partial order, with which is associated a set of solutions that cannot simultaneously be improved in all the criteria. Usually they are called maximal, efficient solutions or Pareto optimal.