Publicação
Integrating machine learning and time-to-event models to explain and predict risk of hospitalization due to dengue in Colombia
| Resumo: | Arboviral diseases such as dengue pose major public health challenges in endemic regions, notably in Norte de Santander (Colombia), where they place substantial pressure on healthcare services. We analyzed 8,814 confirmed dengue cases reported to the Colombian National Public Health Surveillance System (SIVIGILA) from January 2015 to June 2019 to investigate temporal dynamics and determinants of hospitalization. We applied a dual methodology based on: (i) machine-learning classifiers—logistic regression, random forest, and support vector machines—to predict hospitalization risk from symptom profiles and (ii) Cox models with time-varying coefficients to assess the timing of hospitalization as a function of socio-environmental and clinical predictors, accommodating non-proportional hazards. Our main findings are as follows. On average, patients sought medical attention about four days after symptom onset. Severe and non-severe cases had similar onset-to-hospitalization times, but severe cases were often admitted shortly after the appearance of key warning signs. Abdominal pain and low platelet count markedly increased the risk of hospitalization in classification models and were associated with higher hazards of earlier hospitalization in the time-to-event analysis, with vomiting likewise linked to earlier hospitalization. Among classifiers, random forest achieved the highest predictive accuracy, whereas logistic regression and Cox models yielded interpretable estimates of risk (odds ratios) and timing (time-varying hazard ratios). These findings highlight the value of early recognition of specific symptoms and the integration of machine learning with survival analysis to support proactive, resource-aware dengue management. All analyses were conducted in the R software. |
|---|---|
| Autores principais: | Velasco, Henry |
| Outros Autores: | Ortiz, Santiago; Catano-Lopez, Alexandra; Castro, Cecilia; Martin-Barreiro, Carlos; Leiva, Víctor |
| Assunto: | Ciências Naturais::Matemáticas |
| Ano: | 2025 |
| País: | Portugal |
| Tipo de documento: | artigo |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade do Minho |
| Idioma: | inglês |
| Origem: | RepositóriUM - Universidade do Minho |
| Resumo: | Arboviral diseases such as dengue pose major public health challenges in endemic regions, notably in Norte de Santander (Colombia), where they place substantial pressure on healthcare services. We analyzed 8,814 confirmed dengue cases reported to the Colombian National Public Health Surveillance System (SIVIGILA) from January 2015 to June 2019 to investigate temporal dynamics and determinants of hospitalization. We applied a dual methodology based on: (i) machine-learning classifiers—logistic regression, random forest, and support vector machines—to predict hospitalization risk from symptom profiles and (ii) Cox models with time-varying coefficients to assess the timing of hospitalization as a function of socio-environmental and clinical predictors, accommodating non-proportional hazards. Our main findings are as follows. On average, patients sought medical attention about four days after symptom onset. Severe and non-severe cases had similar onset-to-hospitalization times, but severe cases were often admitted shortly after the appearance of key warning signs. Abdominal pain and low platelet count markedly increased the risk of hospitalization in classification models and were associated with higher hazards of earlier hospitalization in the time-to-event analysis, with vomiting likewise linked to earlier hospitalization. Among classifiers, random forest achieved the highest predictive accuracy, whereas logistic regression and Cox models yielded interpretable estimates of risk (odds ratios) and timing (time-varying hazard ratios). These findings highlight the value of early recognition of specific symptoms and the integration of machine learning with survival analysis to support proactive, resource-aware dengue management. All analyses were conducted in the R software. |
|---|