Publicação
Variable selection in linear regression models with large number of predictors
| Resumo: | In this thesis, we study the problem of variable selection in linear regression models in the presence of a large number of predictors. Usually, some of these predictors are correlated, so including all of them in a regression model will not essentially improve the model's predictive ability. Also, models with reasonable and tractable amount of predictors are easier to interpret than models with a large number of predictors. Therefore, variable selection is an important problem to study. Given that there are some popular regression methods capable of handling collinearity in data but still requiring the removal of irrelevant predictors, so we present an algorithm that enable these methods to perform variable selection. We review the well-known variable selection methods, and investigate the performance of these methods as well as the proposed approach on both simulated and real data sets. The results show that the new algorithm performs well in selecting the relevant variables. Also, when the data contains outliers, outlier detection and variable selection are not two separable problems. Therefore, we propose a method capable of outlier detection and variable selection. We review the well-known robust variable selection methods and evaluate the performance of these methods with the proposed approach on contaminated simulation data sets as well as on real data. The results show that the proposed method performs well concerning both outlier detection and robust variable selection. |
|---|---|
| Autores principais: | Shahriari, Shirin |
| Assunto: | Bootstrap Least angle regression (LARS) Linear regression Partial least squares regression (PLSR) Outlier detection Variable selection Deteção de outliers Principal components regression (PCR) Regressão linear Seleção de variáveis |
| Ano: | 2014 |
| País: | Portugal |
| Tipo de documento: | tese de doutoramento |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade do Minho |
| Idioma: | inglês |
| Origem: | RepositóriUM - Universidade do Minho |
Registos relacionados
article Outlier detection and robust variable selection for least angle regression
por: Shahriari, Shirin
Publicado em: (2014)
por: Shahriari, Shirin
Publicado em: (2014)
article Variable selection methods in high-dimensional regression: a simulation study
por: Shahriari, Shirin
Publicado em: (2015)
por: Shahriari, Shirin
Publicado em: (2015)
category Robust linear model selection in high dimensional data
por: Shahriari, Shirin
Publicado em: (2013)
por: Shahriari, Shirin
Publicado em: (2013)
category A non parametric robust method for the detection of outliers in linear models
por: Faria, Susana
Publicado em: (2006)
por: Faria, Susana
Publicado em: (2006)
book Selecção robusta em modelos de regressão linear com um grande número de preditores
por: Shahriari, Shirin
Publicado em: (2017)
por: Shahriari, Shirin
Publicado em: (2017)
article High-throughput FTIR-based bioprocess analysis of recombinant cyprosin production
por: Sampaio, Pedro
Publicado em: (2017)
por: Sampaio, Pedro
Publicado em: (2017)
article Robust identification of target genes and outliers in triple-negative breast cancer data
por: Segaert, Pieter
Publicado em: (2018)
por: Segaert, Pieter
Publicado em: (2018)
article UV spectrophotometry method for the monitoring of galacto-oligosaccharides production
por: Dias, Luís G.
Publicado em: (2009)
por: Dias, Luís G.
Publicado em: (2009)
article In situ near infrared spectroscopy monitoring of cyprosin production by recombinant Saccharomyces cerevisiae strains
por: Sampaio, Pedro
Publicado em: (2014)
por: Sampaio, Pedro
Publicado em: (2014)
school The impact of banking crises on public debt : the case-study of Portugal (1970-2015)
por: Pinto, Nuno Dias Duarte Parraça
Publicado em: (2021)
por: Pinto, Nuno Dias Duarte Parraça
Publicado em: (2021)
article Fitting mixtures of linear regressions
por: Faria, Susana
Publicado em: (2010)
por: Faria, Susana
Publicado em: (2010)
article Evaluation of green coffee beans quality using near infrared spectroscopy: A quantitative approach
por: Santos, João Rodrigo
Publicado em: (2012)
por: Santos, João Rodrigo
Publicado em: (2012)
draft Two-way relationship between inequality and growth within fiscal policy channel : an empirical assessment for European countries
por: Coelho, José Carlos
Publicado em: (2021)
por: Coelho, José Carlos
Publicado em: (2021)
article Controlling factors of the size and location of large gully systems: a regression-based exploration using reconstructed pre-erosion topography
por: Bergonse, Rafaello
Publicado em: (2016)
por: Bergonse, Rafaello
Publicado em: (2016)
image Testing for structural change in a mixture of linear regressions
por: Faria, Susana
Publicado em: (2006)
por: Faria, Susana
Publicado em: (2006)
article Measuring the component overlapping in mixtures of linear regressions
por: Faria, Susana
Publicado em: (2013)
por: Faria, Susana
Publicado em: (2013)
article Estimation of Serpa cheese ripening time using multiple linear regression (MLR) considering rheological, physical and chemical data
por: Alvarenga, Nuno
Publicado em: (2008)
por: Alvarenga, Nuno
Publicado em: (2008)
category Misturas de regressões lineares: um novo teste de alteração da estrutura
por: Faria, Susana
Publicado em: (2006)
por: Faria, Susana
Publicado em: (2006)
school Quais os fatores que determinam o rendimento dos indivíduos em Portugal? - Regressão de Quantis
por: Rodrigues, Cátia Sofia Martins
Publicado em: (2021)
por: Rodrigues, Cátia Sofia Martins
Publicado em: (2021)
article Valuing biodiversity enhancement in New Zealand's planted forests: socioeconomic and spatial determinants of willingness-to-pay
por: Yao, Richard T.
Publicado em: (2014)
por: Yao, Richard T.
Publicado em: (2014)
school Técnica de regressão PLS e PCR: metodologias e aplicação
por: Oliveira, Beatriz Azevedo
Publicado em: (2022)
por: Oliveira, Beatriz Azevedo
Publicado em: (2022)
article Estimation of Serpa cheese ripening time using multiple linear regression (MLR) considering rheological, physical and chemical data
por: Silva, Paula
Publicado em: (2008)
por: Silva, Paula
Publicado em: (2008)
article A robust sparce linear approach for contamined data
por: Shahriari, Shirin
Publicado em: (2019)
por: Shahriari, Shirin
Publicado em: (2019)
article Forecasting in data-rich environments
por: Conraria, Luís Aguiar
Publicado em: (2004)
por: Conraria, Luís Aguiar
Publicado em: (2004)
article Impact of variable transformations on multiple regression models for enhancing gait normalization
por: Ferreira, Flora
Publicado em: (2023)
por: Ferreira, Flora
Publicado em: (2023)
article Estimation of effluent quality parameters from an activated sludge system using quantitative image analysis
por: Mesquita, D. P.
Publicado em: (2016)
por: Mesquita, D. P.
Publicado em: (2016)
article Shiny app to predict agricultural tire dimensions
por: Antunes, Ana Rita Oliveira
Publicado em: (2020)
por: Antunes, Ana Rita Oliveira
Publicado em: (2020)
article Enabling network inference methods to handle missing data and outliers
por: Folch-Fortuny, Abel
Publicado em: (2015)
por: Folch-Fortuny, Abel
Publicado em: (2015)
school Analysis of the performance of students : the case of UP and UCP
por: Natividade, Miguel Stella da Silva
Publicado em: (2018)
por: Natividade, Miguel Stella da Silva
Publicado em: (2018)
school Previsão do valor Brix: aplicação de algoritmos de Machine Learning
por: Silva, Catarina Andrade Mira Antunes da
Publicado em: (2021)
por: Silva, Catarina Andrade Mira Antunes da
Publicado em: (2021)
school Modelos lineares generalizados na análise de dados de saúde
por: Fernando, Lúria Constância Cavalata
Publicado em: (2021)
por: Fernando, Lúria Constância Cavalata
Publicado em: (2021)
science Nomophobia and its predictors: the role of psychological, sociodemographic, and internet use factors
por: Ferreira, Inês Saraiva
Publicado em: (2025)
por: Ferreira, Inês Saraiva
Publicado em: (2025)
article Monitoring olive oils quality and oxidative resistance during storage using an electronic tongue
por: Rodrigues, Nuno
Publicado em: (2016)
por: Rodrigues, Nuno
Publicado em: (2016)
school Estimation of site index from environmental variables for eucalypt stands in Portugal
por: Phuong, Nguyen Thanh
Publicado em: (2021)
por: Phuong, Nguyen Thanh
Publicado em: (2021)
draft Assessing Pension expenditure determinants : the case of Portugal
por: Garcia, Maria Teresa
Publicado em: (2019)
por: Garcia, Maria Teresa
Publicado em: (2019)
article Modeling fitness variable responses to training In prepubescent children
por: Alves, Ana R
Publicado em: (2017)
por: Alves, Ana R
Publicado em: (2017)
groups On the enrichment of random factors models
por: Nunes, Célia
Publicado em: (2017)
por: Nunes, Célia
Publicado em: (2017)
article Automatic identification of activated sludge disturbances and assessment of operational parameters
por: Amaral, A. L.
Publicado em: (2013)
por: Amaral, A. L.
Publicado em: (2013)
article Correlation between sludge settleability and image analysis information using Partial Least Squares
por: Mesquita, D. P.
Publicado em: (2008)
por: Mesquita, D. P.
Publicado em: (2008)
school Statistical analysis of variables of tires' specifications and respective tests
por: Santos, Renato Fernandes dos
Publicado em: (2019)
por: Santos, Renato Fernandes dos
Publicado em: (2019)
Registos relacionados
-
article Outlier detection and robust variable selection for least angle regression
por: Shahriari, Shirin
Publicado em: (2014) -
article Variable selection methods in high-dimensional regression: a simulation study
por: Shahriari, Shirin
Publicado em: (2015) -
category Robust linear model selection in high dimensional data
por: Shahriari, Shirin
Publicado em: (2013) -
category A non parametric robust method for the detection of outliers in linear models
por: Faria, Susana
Publicado em: (2006) -
book Selecção robusta em modelos de regressão linear com um grande número de preditores
por: Shahriari, Shirin
Publicado em: (2017)