Publicação

Architecture for scalable deployment of AI models

Ver documento

Detalhes bibliográficos
Resumo:This work presents a modular architecture for the scalable deployment of Artificial Intelligence (AI) models that combines Infrastructure-as-Code, container orchestration, and automated observability-driven control loops. The system provisions compute resources on on-premises Proxmox environments using Terraform, applies post-provision configuration with Ansible, orchestrates containerized services through Docker Swarm, serves Machine Learning (ML) models via TorchServe, and stores and visualizes operational metrics using InfluxDB and Grafana. The final design closes an autonomous feedback loop in which Grafana alerts trigger a backend that executes Terraform actions to add or remove worker nodes; newly created machines are configured and joined to the cluster automatically by Ansible. The prototype was validated with two pretrained image classification models (ResNet-18, DenseNet-161), demonstrating functional correctness (idempotent provisioning, service replication, load balancing, and failover) and performance benefits under load when elastic scaling is enabled. While the approach proved portable between Amazon Web Services (AWS) and Proxmox and effective for medium scale workloads, the evaluation surfaced practical constraints—most notably Virtual Machine (VM) provisioning latency and a five-minute alert resolution delay—that limit responsiveness to short bursts. The architecture meets its primary objectives of scalable, automated model serving with minimal operator intervention, and outlines opportunities for reducing reaction time (e.g., container level scaling before VM creation) and enhancing scheduling sophistication.
Autores principais:Costa, João Vítor Nogueira da
Assunto:Infrastructure as code (IaC) Terraform AWS Docker Auto-scaling ML provisioning
Ano:2025
País:Portugal
Tipo de documento:dissertação de mestrado
Tipo de acesso:acesso aberto
Instituição associada:Instituto Politécnico de Bragança
Idioma:inglês
Origem:Biblioteca Digital do IPB
Descrição
Resumo:This work presents a modular architecture for the scalable deployment of Artificial Intelligence (AI) models that combines Infrastructure-as-Code, container orchestration, and automated observability-driven control loops. The system provisions compute resources on on-premises Proxmox environments using Terraform, applies post-provision configuration with Ansible, orchestrates containerized services through Docker Swarm, serves Machine Learning (ML) models via TorchServe, and stores and visualizes operational metrics using InfluxDB and Grafana. The final design closes an autonomous feedback loop in which Grafana alerts trigger a backend that executes Terraform actions to add or remove worker nodes; newly created machines are configured and joined to the cluster automatically by Ansible. The prototype was validated with two pretrained image classification models (ResNet-18, DenseNet-161), demonstrating functional correctness (idempotent provisioning, service replication, load balancing, and failover) and performance benefits under load when elastic scaling is enabled. While the approach proved portable between Amazon Web Services (AWS) and Proxmox and effective for medium scale workloads, the evaluation surfaced practical constraints—most notably Virtual Machine (VM) provisioning latency and a five-minute alert resolution delay—that limit responsiveness to short bursts. The architecture meets its primary objectives of scalable, automated model serving with minimal operator intervention, and outlines opportunities for reducing reaction time (e.g., container level scaling before VM creation) and enhancing scheduling sophistication.