Document details

SLUG

Author(s): Rodrigues, Nuno M. ; Batista, João E. ; La Cava, William ; Vanneschi, Leonardo ; Silva, Sara

Date: 2022

Persistent ID: http://hdl.handle.net/10362/142821

Origin: Repositório Institucional da UNL

Project/scholarship: info:eu-repo/grantAgreement/FCT/3599-PPCDT/PTDC%2FCCI-INF%2F29168%2F2017/PT; info:eu-repo/grantAgreement/FCT/3599-PPCDT/DSAIPA%2FDS%2F0022%2F2018/PT; info:eu-repo/grantAgreement/FCT/3599-PPCDT/DSAIPA%2FDS%2F0113%2F2019/PT;

Subject(s): Feature Selection; Epistasis; Genetic Programming; Genetic Algorithms; Machine Learning; Theoretical Computer Science; Computer Science(all)


Description

Rodrigues, N. M., Batista, J. E., La Cava, W., Vanneschi, L., & Silva, S. (2022). SLUG: Feature Selection Using Genetic Algorithms and Genetic Programming. In E. Medvet, G. Pappa, & B. Xue (Eds.), Genetic Programming: 25th European Conference, EuroGP 2022, Held as Part of EvoStar 2022, Madrid, Spain, April 20–22, 2022, Proceedings (pp. 68-84). (Lecture Notes in Computer Science; Vol. 13223). Springer. https://doi.org/10.1007/978-3-031-02056-8_5 -------------------------------------------------------------------This work was supported by FCT, Portugal, through funding of LASIGE Research Unit (UIDB/00408/2020 and UIDP/00408/2020); MAR2020 program via project MarCODE (MAR-01.03.01-FEAMP-0047); projects BINDER (PTDC/CCI-INF/29168/2017), AICE (DSAIPA/DS/0113/2019), OPTOX (PTDC/CTA-AMB/30056/2017) and GADgET (DSAIPA/DS/0022/2018). Nuno Rodrigues and João Batista were supported by PhD Grants 2021/05322/BD and SFRH/BD/143972/2019, respectively; William La Cava was supported by the National Library Of Medicine of the National Institutes of Health under Award Number R00LM012926.

We present SLUG, a method that uses genetic algorithms as a wrapper for genetic programming (GP), to perform feature selection while inducing models. This method is first tested on four regular binary classification datasets, and then on 10 synthetic datasets produced by GAMETES, a tool for embedding epistatic gene-gene interactions into noisy datasets. We compare the results of SLUG with the ones obtained by other GP-based methods that had already been used on the GAMETES problems, concluding that the proposed approach is very successful, particularly on the epistatic datasets. We discuss the merits and weaknesses of SLUG and its various parts, i.e. the wrapper and the learner, and we perform additional experiments, aimed at comparing SLUG with other state-of-the-art learners, like decision trees, random forests and extreme gradient boosting. Despite the fact that SLUG is not the most efficient method in terms of training time, it is confirmed as the most effective method in terms of accuracy.

Document Type Conference object
Language English
Contributor(s) NOVA Information Management School (NOVA IMS); Information Management Research Center (MagIC) - NOVA Information Management School; RUN
facebook logo  linkedin logo  twitter logo 
mendeley logo

Related documents