Publicação

Variable selection in linear regression models with large number of predictors

Ver documento

Detalhes bibliográficos
Resumo:In this thesis, we study the problem of variable selection in linear regression models in the presence of a large number of predictors. Usually, some of these predictors are correlated, so including all of them in a regression model will not essentially improve the model's predictive ability. Also, models with reasonable and tractable amount of predictors are easier to interpret than models with a large number of predictors. Therefore, variable selection is an important problem to study. Given that there are some popular regression methods capable of handling collinearity in data but still requiring the removal of irrelevant predictors, so we present an algorithm that enable these methods to perform variable selection. We review the well-known variable selection methods, and investigate the performance of these methods as well as the proposed approach on both simulated and real data sets. The results show that the new algorithm performs well in selecting the relevant variables. Also, when the data contains outliers, outlier detection and variable selection are not two separable problems. Therefore, we propose a method capable of outlier detection and variable selection. We review the well-known robust variable selection methods and evaluate the performance of these methods with the proposed approach on contaminated simulation data sets as well as on real data. The results show that the proposed method performs well concerning both outlier detection and robust variable selection.
Autores principais:Shahriari, Shirin
Assunto:Bootstrap Least angle regression (LARS) Linear regression Partial least squares regression (PLSR) Outlier detection Variable selection Deteção de outliers Principal components regression (PCR) Regressão linear Seleção de variáveis Ciências Naturais::Outras Ciências Naturais
Ano:2014
País:Portugal
Tipo de documento:tese de doutoramento
Tipo de acesso:acesso aberto
Instituição associada:Universidade do Minho
Idioma:inglês
Origem:RepositóriUM - Universidade do Minho
_version_ 1867438357282291712
author Shahriari, Shirin
author_facet Shahriari, Shirin
author_role author
contributor_name_str_mv Faria, Susana
Gonçalves, A. Manuela
RepositóriUM - Universidade do Minho
country_str PT
creators_json_txt [{\"Person.name\":\"Shahriari, Shirin\"}]
datacite.contributors.contributor.contributorName.fl_str_mv Faria, Susana
Gonçalves, A. Manuela
RepositóriUM - Universidade do Minho
datacite.creators.creator.creatorName.fl_str_mv Shahriari, Shirin
datacite.date.Accepted.fl_str_mv 2014-12-18T00:00:00Z
datacite.date.available.fl_str_mv 2015-06-03T16:08:39Z
datacite.date.embargoed.fl_str_mv 2015-06-03T16:08:39Z
datacite.rights.fl_str_mv http://purl.org/coar/access_right/c_abf2
datacite.subjects.subject.fl_str_mv Bootstrap
Least angle regression (LARS)
Linear regression
Partial least squares regression (PLSR)
Outlier detection
Variable selection
Deteção de outliers
Principal components regression (PCR)
Regressão linear
Seleção de variáveis
Ciências Naturais::Outras Ciências Naturais
datacite.titles.title.fl_str_mv Variable selection in linear regression models with large number of predictors
dc.contributor.none.fl_str_mv Faria, Susana
Gonçalves, A. Manuela
RepositóriUM - Universidade do Minho
dc.creator.none.fl_str_mv Shahriari, Shirin
dc.date.Accepted.fl_str_mv 2014-12-18T00:00:00Z
dc.date.available.fl_str_mv 2015-06-03T16:08:39Z
dc.date.embargoed.fl_str_mv 2015-06-03T16:08:39Z
dc.format.none.fl_str_mv application/pdf
dc.identifier.none.fl_str_mv https://hdl.handle.net/1822/35421
dc.language.none.fl_str_mv eng
dc.rights.none.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.subject.none.fl_str_mv Bootstrap
Least angle regression (LARS)
Linear regression
Partial least squares regression (PLSR)
Outlier detection
Variable selection
Deteção de outliers
Principal components regression (PCR)
Regressão linear
Seleção de variáveis
Ciências Naturais::Outras Ciências Naturais
dc.title.fl_str_mv Variable selection in linear regression models with large number of predictors
dc.type.none.fl_str_mv http://purl.org/coar/resource_type/c_db06
description In this thesis, we study the problem of variable selection in linear regression models in the presence of a large number of predictors. Usually, some of these predictors are correlated, so including all of them in a regression model will not essentially improve the model's predictive ability. Also, models with reasonable and tractable amount of predictors are easier to interpret than models with a large number of predictors. Therefore, variable selection is an important problem to study. Given that there are some popular regression methods capable of handling collinearity in data but still requiring the removal of irrelevant predictors, so we present an algorithm that enable these methods to perform variable selection. We review the well-known variable selection methods, and investigate the performance of these methods as well as the proposed approach on both simulated and real data sets. The results show that the new algorithm performs well in selecting the relevant variables. Also, when the data contains outliers, outlier detection and variable selection are not two separable problems. Therefore, we propose a method capable of outlier detection and variable selection. We review the well-known robust variable selection methods and evaluate the performance of these methods with the proposed approach on contaminated simulation data sets as well as on real data. The results show that the proposed method performs well concerning both outlier detection and robust variable selection.
dirty 0
eu_rights_str_mv openAccess
format doctoralThesis
fulltext.url.fl_str_mv https://repositorium.uminho.pt/bitstreams/b77d6f8f-4295-4be7-9d06-442183dd2b0d/download
id rum_d1ce43e37b752a97cc7d079a2c2f8b7d
identifier.url.fl_str_mv https://hdl.handle.net/1822/35421
instacron_str repositorium
institution Universidade do Minho
instname_str Universidade do Minho
language eng
network_acronym_str rum
network_name_str RepositóriUM - Universidade do Minho
oai_identifier_str oai:repositorium.uminho.pt:1822/35421
organization_str_mv urn:organizationAcronym:repositorium
person_str_mv Shahriari, Shirin
publishDate 2014
reponame_str RepositóriUM - Universidade do Minho
repository_id_str urn:repositoryAcronym:rum
service_str_mv urn:repositoryAcronym:rum
spelling engporIn this thesis, we study the problem of variable selection in linear regression models in the presence of a large number of predictors. Usually, some of these predictors are correlated, so including all of them in a regression model will not essentially improve the model's predictive ability. Also, models with reasonable and tractable amount of predictors are easier to interpret than models with a large number of predictors. Therefore, variable selection is an important problem to study. Given that there are some popular regression methods capable of handling collinearity in data but still requiring the removal of irrelevant predictors, so we present an algorithm that enable these methods to perform variable selection. We review the well-known variable selection methods, and investigate the performance of these methods as well as the proposed approach on both simulated and real data sets. The results show that the new algorithm performs well in selecting the relevant variables. Also, when the data contains outliers, outlier detection and variable selection are not two separable problems. Therefore, we propose a method capable of outlier detection and variable selection. We review the well-known robust variable selection methods and evaluate the performance of these methods with the proposed approach on contaminated simulation data sets as well as on real data. The results show that the proposed method performs well concerning both outlier detection and robust variable selection.application/pdfporVariable selection in linear regression models with large number of predictorsShahriari, ShirinFaria, SusanaGonçalves, A. ManuelaHostingInstitutionOrganizationalRepositóriUM - Universidade do Minhoe-mailmailto:repositorium@usdb.uminho.ptrepositorium@usdb.uminho.ptTID1013703692015-06-03T16:08:39Z2014-12-182014-10-272014-12-18T00:00:00ZHandlehttps://hdl.handle.net/1822/35421http://purl.org/coar/access_right/c_abf2open accessBootstrapLeast angle regression (LARS)Linear regressionPartial least squares regression (PLSR)Outlier detectionVariable selectionDeteção de outliersPrincipal components regression (PCR)Regressão linearSeleção de variáveishttp://www.oecd.org/science/inno/38235147.pdfFields of Science and Technology (FOS)Ciências Naturais::Outras Ciências Naturais3541471 bytesliteraturehttp://purl.org/coar/resource_type/c_db06doctoral thesishttp://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://repositorium.uminho.pt/bitstreams/b77d6f8f-4295-4be7-9d06-442183dd2b0d/download
spellingShingle Variable selection in linear regression models with large number of predictors
Shahriari, Shirin
Bootstrap
Least angle regression (LARS)
Linear regression
Partial least squares regression (PLSR)
Outlier detection
Variable selection
Deteção de outliers
Principal components regression (PCR)
Regressão linear
Seleção de variáveis
Ciências Naturais::Outras Ciências Naturais
status SINGLETON
subject.fl_str_mv Bootstrap
Least angle regression (LARS)
Linear regression
Partial least squares regression (PLSR)
Outlier detection
Variable selection
Deteção de outliers
Principal components regression (PCR)
Regressão linear
Seleção de variáveis
subject.other.fl_str_mv Ciências Naturais::Outras Ciências Naturais
title Variable selection in linear regression models with large number of predictors
title_full Variable selection in linear regression models with large number of predictors
title_fullStr Variable selection in linear regression models with large number of predictors
title_full_unstemmed Variable selection in linear regression models with large number of predictors
title_short Variable selection in linear regression models with large number of predictors
title_sort Variable selection in linear regression models with large number of predictors
topic Bootstrap
Least angle regression (LARS)
Linear regression
Partial least squares regression (PLSR)
Outlier detection
Variable selection
Deteção de outliers
Principal components regression (PCR)
Regressão linear
Seleção de variáveis
Ciências Naturais::Outras Ciências Naturais
topic_facet Bootstrap
Least angle regression (LARS)
Linear regression
Partial least squares regression (PLSR)
Outlier detection
Variable selection
Deteção de outliers
Principal components regression (PCR)
Regressão linear
Seleção de variáveis
Ciências Naturais::Outras Ciências Naturais
url https://hdl.handle.net/1822/35421
visible 1