Publicação
Data and computer center prediction of usage and cost: an interpretable machine learning approach
| Resumo: | In recent years, Cloud computing usage has considerably increased and, nowadays, it is the backbone of many emerging applications. However, behind cloud structures, we have physical infrastructures (data centers) for which managing is difficult due to un- predictable utilization patterns. To address the constraints of reactive auto-scaling, data centers are widely adopting predictive cloud resource management mechanisms. How- ever, predictive methods rely on application workloads and are typically pre-optimized for specific patterns, which can cause under/over-provisioning of resources. Accurate workload forecasts are necessary to gain efficiency, save money, and provide clients with better and faster services. Working with real data from a Portuguese bank, we propose Ensemble Adaptive Model with Drift detector (EAMDrift). This novel method combines forecasts from multi- ple individual predictors by giving weights to each individual model prediction according to a performance metric. EAMDrift automatically retrains when needed and identifies the most appropriate models to use at each moment through interpretable mechanisms. We tested our novel methodology in a real data problem, by studying the influence of external signals (mass and social media) on data center workloads. As we are working with real data from a bank, we hypothesize that users can increase or decrease the usage of some applications depending on external factors such as controversies or news about economics. For this study, EAMDrift was projected to allow multiple past covariates. We evaluated EAMDrift in different workloads and compared the results with sev- eral baseline methods models. The experimental evaluation shows that EAMDrift out- performs individual baseline models in 15% to 25%. Compared to the best black-box ensemble model, our model has a comparable error (increased in 1-3%). Thus, this work suggests that interpretable models are a viable solution for data center workload predic- tion. |
|---|---|
| Autores principais: | Mateus, Gonçalo Furtado |
| Assunto: | Data center management Interpretable machine learning Dynamic prediction model Natural language processing Feature extraction |
| Ano: | 2023 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade Nova de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório Institucional da UNL |
| Resumo: | In recent years, Cloud computing usage has considerably increased and, nowadays, it is the backbone of many emerging applications. However, behind cloud structures, we have physical infrastructures (data centers) for which managing is difficult due to un- predictable utilization patterns. To address the constraints of reactive auto-scaling, data centers are widely adopting predictive cloud resource management mechanisms. How- ever, predictive methods rely on application workloads and are typically pre-optimized for specific patterns, which can cause under/over-provisioning of resources. Accurate workload forecasts are necessary to gain efficiency, save money, and provide clients with better and faster services. Working with real data from a Portuguese bank, we propose Ensemble Adaptive Model with Drift detector (EAMDrift). This novel method combines forecasts from multi- ple individual predictors by giving weights to each individual model prediction according to a performance metric. EAMDrift automatically retrains when needed and identifies the most appropriate models to use at each moment through interpretable mechanisms. We tested our novel methodology in a real data problem, by studying the influence of external signals (mass and social media) on data center workloads. As we are working with real data from a bank, we hypothesize that users can increase or decrease the usage of some applications depending on external factors such as controversies or news about economics. For this study, EAMDrift was projected to allow multiple past covariates. We evaluated EAMDrift in different workloads and compared the results with sev- eral baseline methods models. The experimental evaluation shows that EAMDrift out- performs individual baseline models in 15% to 25%. Compared to the best black-box ensemble model, our model has a comparable error (increased in 1-3%). Thus, this work suggests that interpretable models are a viable solution for data center workload predic- tion. |
|---|