Publicação

Aircraft Detection in Satellite Imagery: A Comparative Study of Classic Vision Models and LLM

Detalhes bibliográficos
Resumo:	Automated aircraft detection in satellite imagery is critical for applications including air traffic monitoring, defense intelligence, and disaster response. Traditional approaches rely on supervised deep learning models that require extensive labeled data and computational resources but lack semantic understanding and interpretability. The emergence of multimodal Large Language Models (LLMs) with vision capabilities introduces an alternative paradigm enabling zero-shot detection through natural language interaction without task-specific training. This study compares YOLOv8, a state-of-the-art supervised object detector, against ChatGPT-5, a multimodal LLM, for aircraft detection in satellite imagery. Using a curated dataset of 1,952 aircraft instances from the xView and DOTA benchmarks, both approaches were evaluated across standard object detection metrics including mean Average Precision (mAP), precision, recall, and F1-score. YOLOv8 achieved substantially higher performance (mAP@50: 0.451, mAP@50-95: 0.264, F1-score: 0.581) compared to ChatGPT-5 (mAP@50: 0.139, mAP@50-95: 0.005, F1-score: 0.123), demonstrating that task-specific modelling remains superior for accurate detection. However, supplementary experiments using relaxed IoU thresholds revealed that ChatGPT-5's limitations stem primarily from imprecise localization rather than fundamental recognition failure, with performance improvements (93.5% for ChatGPT-5 and only 4.4% for YOLOv8) under mAP@30 evaluation. The findings show fundamental trade-offs between precision and accessibility, highlighting complementary roles: supervised models excel in mission-critical applications requiring high accuracy, while LLMs enable easier and non-expert access to detection capabilities through natural language interfaces despite lower quantitative performance. This research contributes a rigorous evaluation framework for comparing traditional and emerging AI paradigms in remote sensing, providing empirical evidence of current LLM capabilities and limitations while identifying opportunities for hybrid approaches that leverage the complementary strengths of both methodologies.
Autores principais:	Cabral, Samuel Costa
Assunto:	Aircraft detection Satellite imagery Large Language Models Remote sensing Object detection
Ano:	2026
País:	Portugal
Tipo de documento:	dissertação de mestrado
Tipo de acesso:	acesso aberto
Instituição associada:	Universidade Nova de Lisboa
Idioma:	inglês
Origem:	Repositório Institucional da UNL

Descrição
Resumo:	Automated aircraft detection in satellite imagery is critical for applications including air traffic monitoring, defense intelligence, and disaster response. Traditional approaches rely on supervised deep learning models that require extensive labeled data and computational resources but lack semantic understanding and interpretability. The emergence of multimodal Large Language Models (LLMs) with vision capabilities introduces an alternative paradigm enabling zero-shot detection through natural language interaction without task-specific training. This study compares YOLOv8, a state-of-the-art supervised object detector, against ChatGPT-5, a multimodal LLM, for aircraft detection in satellite imagery. Using a curated dataset of 1,952 aircraft instances from the xView and DOTA benchmarks, both approaches were evaluated across standard object detection metrics including mean Average Precision (mAP), precision, recall, and F1-score. YOLOv8 achieved substantially higher performance (mAP@50: 0.451, mAP@50-95: 0.264, F1-score: 0.581) compared to ChatGPT-5 (mAP@50: 0.139, mAP@50-95: 0.005, F1-score: 0.123), demonstrating that task-specific modelling remains superior for accurate detection. However, supplementary experiments using relaxed IoU thresholds revealed that ChatGPT-5's limitations stem primarily from imprecise localization rather than fundamental recognition failure, with performance improvements (93.5% for ChatGPT-5 and only 4.4% for YOLOv8) under mAP@30 evaluation. The findings show fundamental trade-offs between precision and accessibility, highlighting complementary roles: supervised models excel in mission-critical applications requiring high accuracy, while LLMs enable easier and non-expert access to detection capabilities through natural language interfaces despite lower quantitative performance. This research contributes a rigorous evaluation framework for comparing traditional and emerging AI paradigms in remote sensing, providing empirical evidence of current LLM capabilities and limitations while identifying opportunities for hybrid approaches that leverage the complementary strengths of both methodologies.