Publication
Validation Strategies for Robust Assessment of QSAR Models
| Summary: | Quantitative Structure-Activity Relationship (QSAR) is vital in drug discovery and design. QSAR models are a good example of how technological tools are applied to elevate drug discovery processes, and they can classify molecules as active or non-active based on molecular descriptors. The development and validation of these models are regulated, and various measures must be met for their implementation. However, even with models meeting these conditions and presenting low validation errors, these models are still not widely used in current drug development or pharmacological research. This work explores a possible cause for this event. More specifically, I will study the quality of predictions for two different validation approaches: retrospective and prospective. The prospective approach is a more realistic validation approach as the test set only contains the latest records documented for a particular target, simulating the task of making future predictions. In addition, the impact of structural similarity on QSAR modelling will be assessed with hyperdimensional metric space models. 14 diverse targets and various models were selected to assess the quality of predictions for the two types of validation. The modelling approaches include Support Vector Machines, Random Forests, Extreme Gradient Boosting and Neural Networks. The validation strategies require different ways of partitioning the datasets, which can be random (retrospective) or based on the year the molecules were documented (prospective). Results show, on average, a difference of 35% and the most significant difference of 65% between the two validation approaches. When implementing a prospective approach, this big discrepancy can be problematic as this approach is a more realistic validation than the retrospective approach. These results provide evidence that should raise concerns about including a prospective validation alongside preventive measures for the low performance of the models when performing this necessary type of validation in QSAR modelling. |
|---|---|
| Main Authors: | Almendra, Filipa Alexandra da Silva de Noronha |
| Subject: | Relação Estrutura-Atividade Quantitativa (QSAR) Aprendizagem automática Validação Prospectiva Descoberta de medicamentos Teses de mestrado - 2024 |
| Year: | 2024 |
| Country: | Portugal |
| Document type: | master thesis |
| Access type: | open access |
| Associated institution: | Universidade de Lisboa |
| Language: | English |
| Origin: | Repositório da Universidade de Lisboa |
| Summary: | Quantitative Structure-Activity Relationship (QSAR) is vital in drug discovery and design. QSAR models are a good example of how technological tools are applied to elevate drug discovery processes, and they can classify molecules as active or non-active based on molecular descriptors. The development and validation of these models are regulated, and various measures must be met for their implementation. However, even with models meeting these conditions and presenting low validation errors, these models are still not widely used in current drug development or pharmacological research. This work explores a possible cause for this event. More specifically, I will study the quality of predictions for two different validation approaches: retrospective and prospective. The prospective approach is a more realistic validation approach as the test set only contains the latest records documented for a particular target, simulating the task of making future predictions. In addition, the impact of structural similarity on QSAR modelling will be assessed with hyperdimensional metric space models. 14 diverse targets and various models were selected to assess the quality of predictions for the two types of validation. The modelling approaches include Support Vector Machines, Random Forests, Extreme Gradient Boosting and Neural Networks. The validation strategies require different ways of partitioning the datasets, which can be random (retrospective) or based on the year the molecules were documented (prospective). Results show, on average, a difference of 35% and the most significant difference of 65% between the two validation approaches. When implementing a prospective approach, this big discrepancy can be problematic as this approach is a more realistic validation than the retrospective approach. These results provide evidence that should raise concerns about including a prospective validation alongside preventive measures for the low performance of the models when performing this necessary type of validation in QSAR modelling. |
|---|