Publicação
Selecção de variáveis em estatística multivariada
| Resumo: | The problem of variable selection consists in identifying a k-subset of a set of original variables that is optimal for a given criterion of adequate approximation to the whole data set. In this work we present and we discuss some new optimization criteria and others that are suggested by the literature. We present and we discuss the algorithms for the optimization problems resulting from the different criteria, as well as the calculated computational results. The criteria and algorithms are available in the package Subselect that is called from statistical program R. This package is in permanent update, with varied contributions, between which, this work is included. Package and program are of the public domain and meet available in the Internet. In this work we also discuss a multiple criteria optimization for the problem of identifying subsets of variables. In this approach, we are looking for subsets that are optimal for some criteria simultaneously. The induced total order for an only criterion gives place to a partial order, with which is associated a set of solutions that cannot simultaneously be improved in all the criteria. Usually they are called maximal, efficient solutions or Pareto optimal. |
|---|---|
| Autores principais: | Minhoto, Manuel Joaquim Piteira |
| Assunto: | variable selection multivariate statistics combinatorial optimization Heuristics Pareto optimal |
| Ano: | 2009 |
| País: | Portugal |
| Tipo de documento: | tese de doutoramento |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade de Lisboa |
| Idioma: | português |
| Origem: | Repositório da Universidade de Lisboa |
| Resumo: | The problem of variable selection consists in identifying a k-subset of a set of original variables that is optimal for a given criterion of adequate approximation to the whole data set. In this work we present and we discuss some new optimization criteria and others that are suggested by the literature. We present and we discuss the algorithms for the optimization problems resulting from the different criteria, as well as the calculated computational results. The criteria and algorithms are available in the package Subselect that is called from statistical program R. This package is in permanent update, with varied contributions, between which, this work is included. Package and program are of the public domain and meet available in the Internet. In this work we also discuss a multiple criteria optimization for the problem of identifying subsets of variables. In this approach, we are looking for subsets that are optimal for some criteria simultaneously. The induced total order for an only criterion gives place to a partial order, with which is associated a set of solutions that cannot simultaneously be improved in all the criteria. Usually they are called maximal, efficient solutions or Pareto optimal. |
|---|