Publicação

Validation Strategies for Robust Assessment of QSAR Models

Ver documento

Detalhes bibliográficos
Resumo:Quantitative Structure-Activity Relationship (QSAR) is vital in drug discovery and design. QSAR models are a good example of how technological tools are applied to elevate drug discovery processes, and they can classify molecules as active or non-active based on molecular descriptors. The development and validation of these models are regulated, and various measures must be met for their implementation. However, even with models meeting these conditions and presenting low validation errors, these models are still not widely used in current drug development or pharmacological research. This work explores a possible cause for this event. More specifically, I will study the quality of predictions for two different validation approaches: retrospective and prospective. The prospective approach is a more realistic validation approach as the test set only contains the latest records documented for a particular target, simulating the task of making future predictions. In addition, the impact of structural similarity on QSAR modelling will be assessed with hyperdimensional metric space models. 14 diverse targets and various models were selected to assess the quality of predictions for the two types of validation. The modelling approaches include Support Vector Machines, Random Forests, Extreme Gradient Boosting and Neural Networks. The validation strategies require different ways of partitioning the datasets, which can be random (retrospective) or based on the year the molecules were documented (prospective). Results show, on average, a difference of 35% and the most significant difference of 65% between the two validation approaches. When implementing a prospective approach, this big discrepancy can be problematic as this approach is a more realistic validation than the retrospective approach. These results provide evidence that should raise concerns about including a prospective validation alongside preventive measures for the low performance of the models when performing this necessary type of validation in QSAR modelling.
Autores principais:Almendra, Filipa Alexandra da Silva de Noronha
Assunto:Relação Estrutura-Atividade Quantitativa (QSAR) Aprendizagem automática Validação Prospectiva Descoberta de medicamentos Teses de mestrado - 2024
Ano:2024
País:Portugal
Tipo de documento:dissertação de mestrado
Tipo de acesso:acesso aberto
Instituição associada:Universidade de Lisboa
Idioma:inglês
Origem:Repositório da Universidade de Lisboa
_version_ 1866810261406482432
author Almendra, Filipa Alexandra da Silva de Noronha
author_facet Almendra, Filipa Alexandra da Silva de Noronha
author_role author
contributor_name_str_mv Falcão, André Osório e Cruz de Azerêdo, 1969-
Repositório Científico de Acesso Aberto da ULisboa
country_str PT
creators_json_txt [{\"Person.name\":\"Almendra, Filipa Alexandra da Silva de Noronha\"}]
datacite.contributors.contributor.contributorName.fl_str_mv Falcão, André Osório e Cruz de Azerêdo, 1969-
Repositório Científico de Acesso Aberto da ULisboa
datacite.creators.creator.creatorName.fl_str_mv Almendra, Filipa Alexandra da Silva de Noronha
datacite.date.Accepted.fl_str_mv 2024-01-01T00:00:00Z
datacite.date.available.fl_str_mv 2025-01-16T12:49:58Z
datacite.date.embargoed.fl_str_mv 2025-01-16T12:49:58Z
datacite.rights.fl_str_mv http://purl.org/coar/access_right/c_abf2
datacite.subjects.subject.fl_str_mv Relação Estrutura-Atividade Quantitativa (QSAR)
Aprendizagem automática
Validação
Prospectiva
Descoberta de medicamentos
Teses de mestrado - 2024
datacite.titles.title.fl_str_mv Validation Strategies for Robust Assessment of QSAR Models
dc.contributor.none.fl_str_mv Falcão, André Osório e Cruz de Azerêdo, 1969-
Repositório Científico de Acesso Aberto da ULisboa
dc.creator.none.fl_str_mv Almendra, Filipa Alexandra da Silva de Noronha
dc.date.Accepted.fl_str_mv 2024-01-01T00:00:00Z
dc.date.available.fl_str_mv 2025-01-16T12:49:58Z
dc.date.embargoed.fl_str_mv 2025-01-16T12:49:58Z
dc.format.none.fl_str_mv application/pdf
dc.identifier.none.fl_str_mv http://hdl.handle.net/10400.5/97252
dc.language.none.fl_str_mv eng
dc.rights.none.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.subject.none.fl_str_mv Relação Estrutura-Atividade Quantitativa (QSAR)
Aprendizagem automática
Validação
Prospectiva
Descoberta de medicamentos
Teses de mestrado - 2024
dc.title.fl_str_mv Validation Strategies for Robust Assessment of QSAR Models
dc.type.none.fl_str_mv http://purl.org/coar/resource_type/c_bdcc
description Quantitative Structure-Activity Relationship (QSAR) is vital in drug discovery and design. QSAR models are a good example of how technological tools are applied to elevate drug discovery processes, and they can classify molecules as active or non-active based on molecular descriptors. The development and validation of these models are regulated, and various measures must be met for their implementation. However, even with models meeting these conditions and presenting low validation errors, these models are still not widely used in current drug development or pharmacological research. This work explores a possible cause for this event. More specifically, I will study the quality of predictions for two different validation approaches: retrospective and prospective. The prospective approach is a more realistic validation approach as the test set only contains the latest records documented for a particular target, simulating the task of making future predictions. In addition, the impact of structural similarity on QSAR modelling will be assessed with hyperdimensional metric space models. 14 diverse targets and various models were selected to assess the quality of predictions for the two types of validation. The modelling approaches include Support Vector Machines, Random Forests, Extreme Gradient Boosting and Neural Networks. The validation strategies require different ways of partitioning the datasets, which can be random (retrospective) or based on the year the molecules were documented (prospective). Results show, on average, a difference of 35% and the most significant difference of 65% between the two validation approaches. When implementing a prospective approach, this big discrepancy can be problematic as this approach is a more realistic validation than the retrospective approach. These results provide evidence that should raise concerns about including a prospective validation alongside preventive measures for the low performance of the models when performing this necessary type of validation in QSAR modelling.
dirty 0
eu_rights_str_mv openAccess
format masterThesis
fulltext.url.fl_str_mv https://repositorio.ulisboa.pt/bitstreams/39f684ae-8c09-408e-8480-137aef0b314e/download
id ul_b3d51c60ccdecbe59cfe6e2eb2ee2b2d
identifier.url.fl_str_mv http://hdl.handle.net/10400.5/97252
instacron_str ul
institution Universidade de Lisboa
instname_str Universidade de Lisboa
language eng
network_acronym_str ul
network_name_str Repositório da Universidade de Lisboa
oai_identifier_str oai:repositorio.ulisboa.pt:10400.5/97252
organization_str_mv urn:organizationAcronym:ul
person_str_mv Almendra, Filipa Alexandra da Silva de Noronha
publishDate 2024
reponame_str Repositório da Universidade de Lisboa
repository_id_str urn:repositoryAcronym:ul
service_str_mv urn:repositoryAcronym:ul
spelling engpt_PTQuantitative Structure-Activity Relationship (QSAR) is vital in drug discovery and design. QSAR models are a good example of how technological tools are applied to elevate drug discovery processes, and they can classify molecules as active or non-active based on molecular descriptors. The development and validation of these models are regulated, and various measures must be met for their implementation. However, even with models meeting these conditions and presenting low validation errors, these models are still not widely used in current drug development or pharmacological research. This work explores a possible cause for this event. More specifically, I will study the quality of predictions for two different validation approaches: retrospective and prospective. The prospective approach is a more realistic validation approach as the test set only contains the latest records documented for a particular target, simulating the task of making future predictions. In addition, the impact of structural similarity on QSAR modelling will be assessed with hyperdimensional metric space models. 14 diverse targets and various models were selected to assess the quality of predictions for the two types of validation. The modelling approaches include Support Vector Machines, Random Forests, Extreme Gradient Boosting and Neural Networks. The validation strategies require different ways of partitioning the datasets, which can be random (retrospective) or based on the year the molecules were documented (prospective). Results show, on average, a difference of 35% and the most significant difference of 65% between the two validation approaches. When implementing a prospective approach, this big discrepancy can be problematic as this approach is a more realistic validation than the retrospective approach. These results provide evidence that should raise concerns about including a prospective validation alongside preventive measures for the low performance of the models when performing this necessary type of validation in QSAR modelling.application/pdfpt_PTValidation Strategies for Robust Assessment of QSAR ModelsAlmendra, Filipa Alexandra da Silva de NoronhaFalcão, André Osório e Cruz de Azerêdo, 1969-HostingInstitutionOrganizationalRepositório Científico de Acesso Aberto da ULisboae-mailmailto:repositorio@reitoria.ulisboa.ptrepositorio@reitoria.ulisboa.ptURNurn:tid:2038796352025-01-16T12:49:58Z202420242024-01-01T00:00:00ZHandlehttp://hdl.handle.net/10400.5/97252http://purl.org/coar/access_right/c_abf2open accessRelação Estrutura-Atividade Quantitativa (QSAR)Aprendizagem automáticaValidaçãoProspectivaDescoberta de medicamentosTeses de mestrado - 20241589192 bytesliteraturehttp://purl.org/coar/resource_type/c_bdccmaster thesishttp://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://repositorio.ulisboa.pt/bitstreams/39f684ae-8c09-408e-8480-137aef0b314e/download
spellingShingle Validation Strategies for Robust Assessment of QSAR Models
Almendra, Filipa Alexandra da Silva de Noronha
Relação Estrutura-Atividade Quantitativa (QSAR)
Aprendizagem automática
Validação
Prospectiva
Descoberta de medicamentos
Teses de mestrado - 2024
status SINGLETON
subject.fl_str_mv Relação Estrutura-Atividade Quantitativa (QSAR)
Aprendizagem automática
Validação
Prospectiva
Descoberta de medicamentos
Teses de mestrado - 2024
title Validation Strategies for Robust Assessment of QSAR Models
title_full Validation Strategies for Robust Assessment of QSAR Models
title_fullStr Validation Strategies for Robust Assessment of QSAR Models
title_full_unstemmed Validation Strategies for Robust Assessment of QSAR Models
title_short Validation Strategies for Robust Assessment of QSAR Models
title_sort Validation Strategies for Robust Assessment of QSAR Models
topic Relação Estrutura-Atividade Quantitativa (QSAR)
Aprendizagem automática
Validação
Prospectiva
Descoberta de medicamentos
Teses de mestrado - 2024
topic_facet Relação Estrutura-Atividade Quantitativa (QSAR)
Aprendizagem automática
Validação
Prospectiva
Descoberta de medicamentos
Teses de mestrado - 2024
url http://hdl.handle.net/10400.5/97252
visible 1