Publicação

Road Network Detection and Route Travel Time Estimation from Satellite Imagery

Detalhes bibliográficos
Resumo:	Accurately and quickly extracting road networks from high-resolution satellite images is essential for urban planning, disaster response, and autonomous navigation. Standard Convolutional Neural Networks (CNNs) are advantageous at performing semantic segmentation, but their focus on local features makes it hard for them to keep roads connected when there are obstacles like trees or building shadows. To address this, this thesis examines Hybrid Vision Transformers, focusing on the SegFormer (MiT-B3 encoder). SegFormer uses self-attention to capture global context and is a leading model in computer vision, though it has not been widely tested for mapping road networks. In this thesis this transformer model was compared to a state-of-the-art advanced CNN with dense dilated convolutions (DeepLabV3+ D3S2PP), which is designed for multi-scale context, and to the widely used ResNet50 U-Net, which serves as the main benchmark in this field for the chosen dataset. Most current evaluation methods use pixel-based metrics like Intersection over Union (IoU). However, IoU only measures how much area overlaps and does not account for road connectivity. For example, missing just one pixel can break a major road, making a route unusable, but this barely affects the IoU score. Since in GIS effective routing is a fundamental element, in this thesis a complete evaluation framework that goes beyond pixel accuracy is used. Graph-based metrics were used, including Average Path Length Similarity (APLS), the Weisfeiler-Lehman (WL) Kernel, to directly measure how well the model preserves road structure and connectivity. Also, a new width-based travel time metric was introduced to measure the real-world impact of topological mistakes. Experiments conducted across the SpaceNet 3 dataset reveal that the Hybrid Transformer achieves superior connectivity, significantly outperforming the ResNet baseline in structured cities like Las Vegas (APLS of 0.78 vs. 0.59). However, Transformers sometimes make confident mistakes, predicting false road connections in the background. To address gaps in road connections, a VGG19-based TopologyAware perceptual loss was added to the training process for all the evaluated models. This helped recover more road pixels (raising IoU by about 0.10 for all models), but it did not lead to improvement in actual routing, with APLS decreasing by less than 0.05 on average. This shows that recovering missing road areas and fixing key connection gaps are separate challenges. Post-processing strategies were also tested improving connectivity but also creating false connections at times and deleting real ones. A multi-city domain generalization analysis also found a major drop in performance when models trained on structured environments were tested on new, high-density urban areas. For example, in the dense and unstructured city of Mumbai, pixel-level detection was moderate (IoU about 0.40), but graph connectivity failed completely (APLS less than 0.01). This highlights the serious impact of domain gaps caused by vertical obstructions and different spectral signatures, showing key limitations in current transfer learning methods and pointing to the need for future research in adaptive topological road extraction.
Autores principais:	Malki, El Mehdi Gassa
Assunto:	Artificial Neural Network Deep Learning Satellite Imagery Road Network Extraction Remote Sensing Semantic Segmentation Graph Topology
Ano:	2026
País:	Portugal
Tipo de documento:	dissertação de mestrado
Tipo de acesso:	acesso aberto
Instituição associada:	Universidade Nova de Lisboa
Idioma:	inglês
Origem:	Repositório Institucional da UNL

_version_	1868415511702798336
author	Malki, El Mehdi Gassa
author_facet	Malki, El Mehdi Gassa
author_role	author
contributor_name_str_mv	Baptista, Márcia Lourenço Painho, Marco Octávio Trindade Feitosa, Filipe RUN
country_str	PT
creators_json_txt	[{\"Person.name\":\"Malki, El Mehdi Gassa\"}]
datacite.contributors.contributor.contributorName.fl_str_mv	Baptista, Márcia Lourenço Painho, Marco Octávio Trindade Feitosa, Filipe RUN
datacite.creators.creator.creatorName.fl_str_mv	Malki, El Mehdi Gassa
datacite.date.Accepted.fl_str_mv	2026-02-26T00:00:00Z
datacite.date.available.fl_str_mv	2026-03-11T13:50:57Z
datacite.date.embargoed.fl_str_mv	2026-03-11T13:50:57Z
datacite.rights.fl_str_mv	http://purl.org/coar/access_right/c_abf2
datacite.subjects.subject.fl_str_mv	Artificial Neural Network Deep Learning Satellite Imagery Road Network Extraction Remote Sensing Semantic Segmentation Graph Topology
datacite.titles.title.fl_str_mv	Road Network Detection and Route Travel Time Estimation from Satellite Imagery
dc.contributor.none.fl_str_mv	Baptista, Márcia Lourenço Painho, Marco Octávio Trindade Feitosa, Filipe RUN
dc.creator.none.fl_str_mv	Malki, El Mehdi Gassa
dc.date.Accepted.fl_str_mv	2026-02-26T00:00:00Z
dc.date.available.fl_str_mv	2026-03-11T13:50:57Z
dc.date.embargoed.fl_str_mv	2026-03-11T13:50:57Z
dc.format.none.fl_str_mv	application/pdf
dc.identifier.none.fl_str_mv	http://hdl.handle.net/10362/201262
dc.language.none.fl_str_mv	eng
dc.rights.cclincense.fl_str_mv	http://creativecommons.org/licenses/by/4.0/
dc.rights.none.fl_str_mv	http://purl.org/coar/access_right/c_abf2
dc.subject.none.fl_str_mv	Artificial Neural Network Deep Learning Satellite Imagery Road Network Extraction Remote Sensing Semantic Segmentation Graph Topology
dc.title.fl_str_mv	Road Network Detection and Route Travel Time Estimation from Satellite Imagery
dc.type.none.fl_str_mv	http://purl.org/coar/resource_type/c_bdcc
description	Accurately and quickly extracting road networks from high-resolution satellite images is essential for urban planning, disaster response, and autonomous navigation. Standard Convolutional Neural Networks (CNNs) are advantageous at performing semantic segmentation, but their focus on local features makes it hard for them to keep roads connected when there are obstacles like trees or building shadows. To address this, this thesis examines Hybrid Vision Transformers, focusing on the SegFormer (MiT-B3 encoder). SegFormer uses self-attention to capture global context and is a leading model in computer vision, though it has not been widely tested for mapping road networks. In this thesis this transformer model was compared to a state-of-the-art advanced CNN with dense dilated convolutions (DeepLabV3+ D3S2PP), which is designed for multi-scale context, and to the widely used ResNet50 U-Net, which serves as the main benchmark in this field for the chosen dataset. Most current evaluation methods use pixel-based metrics like Intersection over Union (IoU). However, IoU only measures how much area overlaps and does not account for road connectivity. For example, missing just one pixel can break a major road, making a route unusable, but this barely affects the IoU score. Since in GIS effective routing is a fundamental element, in this thesis a complete evaluation framework that goes beyond pixel accuracy is used. Graph-based metrics were used, including Average Path Length Similarity (APLS), the Weisfeiler-Lehman (WL) Kernel, to directly measure how well the model preserves road structure and connectivity. Also, a new width-based travel time metric was introduced to measure the real-world impact of topological mistakes. Experiments conducted across the SpaceNet 3 dataset reveal that the Hybrid Transformer achieves superior connectivity, significantly outperforming the ResNet baseline in structured cities like Las Vegas (APLS of 0.78 vs. 0.59). However, Transformers sometimes make confident mistakes, predicting false road connections in the background. To address gaps in road connections, a VGG19-based TopologyAware perceptual loss was added to the training process for all the evaluated models. This helped recover more road pixels (raising IoU by about 0.10 for all models), but it did not lead to improvement in actual routing, with APLS decreasing by less than 0.05 on average. This shows that recovering missing road areas and fixing key connection gaps are separate challenges. Post-processing strategies were also tested improving connectivity but also creating false connections at times and deleting real ones. A multi-city domain generalization analysis also found a major drop in performance when models trained on structured environments were tested on new, high-density urban areas. For example, in the dense and unstructured city of Mumbai, pixel-level detection was moderate (IoU about 0.40), but graph connectivity failed completely (APLS less than 0.01). This highlights the serious impact of domain gaps caused by vertical obstructions and different spectral signatures, showing key limitations in current transfer learning methods and pointing to the need for future research in adaptive topological road extraction.
dirty	0
eu_rights_str_mv	openAccess
format	masterThesis
fulltext.url.fl_str_mv	https://run.unl.pt/bitstreams/2b4e9f66-e67b-4b1d-b56d-4dddeaf66c26/download
id	run_59dfdab90b0b04fd896468ff83c8a516
identifier.url.fl_str_mv	http://hdl.handle.net/10362/201262
instacron_str	unl
institution	Universidade Nova de Lisboa
instname_str	Universidade Nova de Lisboa
language	eng
network_acronym_str	run
network_name_str	Repositório Institucional da UNL
oai_identifier_str	oai:run.unl.pt:10362/201262
organization_str_mv	urn:organizationAcronym:unl
person_str_mv	Malki, El Mehdi Gassa
publishDate	2026
reponame_str	Repositório Institucional da UNL
repository_id_str	urn:repositoryAcronym:run
service_str_mv	urn:repositoryAcronym:run
spelling	engengAccurately and quickly extracting road networks from high-resolution satellite images is essential for urban planning, disaster response, and autonomous navigation. Standard Convolutional Neural Networks (CNNs) are advantageous at performing semantic segmentation, but their focus on local features makes it hard for them to keep roads connected when there are obstacles like trees or building shadows. To address this, this thesis examines Hybrid Vision Transformers, focusing on the SegFormer (MiT-B3 encoder). SegFormer uses self-attention to capture global context and is a leading model in computer vision, though it has not been widely tested for mapping road networks. In this thesis this transformer model was compared to a state-of-the-art advanced CNN with dense dilated convolutions (DeepLabV3+ D3S2PP), which is designed for multi-scale context, and to the widely used ResNet50 U-Net, which serves as the main benchmark in this field for the chosen dataset. Most current evaluation methods use pixel-based metrics like Intersection over Union (IoU). However, IoU only measures how much area overlaps and does not account for road connectivity. For example, missing just one pixel can break a major road, making a route unusable, but this barely affects the IoU score. Since in GIS effective routing is a fundamental element, in this thesis a complete evaluation framework that goes beyond pixel accuracy is used. Graph-based metrics were used, including Average Path Length Similarity (APLS), the Weisfeiler-Lehman (WL) Kernel, to directly measure how well the model preserves road structure and connectivity. Also, a new width-based travel time metric was introduced to measure the real-world impact of topological mistakes. Experiments conducted across the SpaceNet 3 dataset reveal that the Hybrid Transformer achieves superior connectivity, significantly outperforming the ResNet baseline in structured cities like Las Vegas (APLS of 0.78 vs. 0.59). However, Transformers sometimes make confident mistakes, predicting false road connections in the background. To address gaps in road connections, a VGG19-based TopologyAware perceptual loss was added to the training process for all the evaluated models. This helped recover more road pixels (raising IoU by about 0.10 for all models), but it did not lead to improvement in actual routing, with APLS decreasing by less than 0.05 on average. This shows that recovering missing road areas and fixing key connection gaps are separate challenges. Post-processing strategies were also tested improving connectivity but also creating false connections at times and deleting real ones. A multi-city domain generalization analysis also found a major drop in performance when models trained on structured environments were tested on new, high-density urban areas. For example, in the dense and unstructured city of Mumbai, pixel-level detection was moderate (IoU about 0.40), but graph connectivity failed completely (APLS less than 0.01). This highlights the serious impact of domain gaps caused by vertical obstructions and different spectral signatures, showing key limitations in current transfer learning methods and pointing to the need for future research in adaptive topological road extraction.application/pdfengRoad Network Detection and Route Travel Time Estimation from Satellite ImageryMalki, El Mehdi GassaBaptista, Márcia LourençoPainho, Marco Octávio TrindadeFeitosa, FilipeHostingInstitutionOrganizationalRUNe-mailmailto:run@unl.ptrun@unl.ptURNurn:tid:2042322872026-03-11T13:50:57Z2026-02-262026-02-26T00:00:00ZHandlehttp://hdl.handle.net/10362/201262http://purl.org/coar/access_right/c_abf2open accessArtificial Neural NetworkDeep LearningSatellite ImageryRoad Network ExtractionRemote SensingSemantic SegmentationGraph Topology2416425 bytesliteraturehttp://purl.org/coar/resource_type/c_bdccmaster thesis2026-02-26http://creativecommons.org/licenses/by/4.0/http://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://run.unl.pt/bitstreams/2b4e9f66-e67b-4b1d-b56d-4dddeaf66c26/download
spellingShingle	Road Network Detection and Route Travel Time Estimation from Satellite Imagery Malki, El Mehdi Gassa Artificial Neural Network Deep Learning Satellite Imagery Road Network Extraction Remote Sensing Semantic Segmentation Graph Topology
status	SINGLETON
subject.fl_str_mv	Artificial Neural Network Deep Learning Satellite Imagery Road Network Extraction Remote Sensing Semantic Segmentation Graph Topology
title	Road Network Detection and Route Travel Time Estimation from Satellite Imagery
title_full	Road Network Detection and Route Travel Time Estimation from Satellite Imagery
title_fullStr	Road Network Detection and Route Travel Time Estimation from Satellite Imagery
title_full_unstemmed	Road Network Detection and Route Travel Time Estimation from Satellite Imagery
title_short	Road Network Detection and Route Travel Time Estimation from Satellite Imagery
title_sort	Road Network Detection and Route Travel Time Estimation from Satellite Imagery
topic	Artificial Neural Network Deep Learning Satellite Imagery Road Network Extraction Remote Sensing Semantic Segmentation Graph Topology
topic_facet	Artificial Neural Network Deep Learning Satellite Imagery Road Network Extraction Remote Sensing Semantic Segmentation Graph Topology
url	http://hdl.handle.net/10362/201262
visible	1

Publicação

Road Network Detection and Route Travel Time Estimation from Satellite Imagery

Registos relacionados