Publicação

Road Network Detection and Route Travel Time Estimation from Satellite Imagery

Ver documento

Detalhes bibliográficos
Resumo:Accurately and quickly extracting road networks from high-resolution satellite images is essential for urban planning, disaster response, and autonomous navigation. Standard Convolutional Neural Networks (CNNs) are advantageous at performing semantic segmentation, but their focus on local features makes it hard for them to keep roads connected when there are obstacles like trees or building shadows. To address this, this thesis examines Hybrid Vision Transformers, focusing on the SegFormer (MiT-B3 encoder). SegFormer uses self-attention to capture global context and is a leading model in computer vision, though it has not been widely tested for mapping road networks. In this thesis this transformer model was compared to a state-of-the-art advanced CNN with dense dilated convolutions (DeepLabV3+ D3S2PP), which is designed for multi-scale context, and to the widely used ResNet50 U-Net, which serves as the main benchmark in this field for the chosen dataset. Most current evaluation methods use pixel-based metrics like Intersection over Union (IoU). However, IoU only measures how much area overlaps and does not account for road connectivity. For example, missing just one pixel can break a major road, making a route unusable, but this barely affects the IoU score. Since in GIS effective routing is a fundamental element, in this thesis a complete evaluation framework that goes beyond pixel accuracy is used. Graph-based metrics were used, including Average Path Length Similarity (APLS), the Weisfeiler-Lehman (WL) Kernel, to directly measure how well the model preserves road structure and connectivity. Also, a new width-based travel time metric was introduced to measure the real-world impact of topological mistakes. Experiments conducted across the SpaceNet 3 dataset reveal that the Hybrid Transformer achieves superior connectivity, significantly outperforming the ResNet baseline in structured cities like Las Vegas (APLS of 0.78 vs. 0.59). However, Transformers sometimes make confident mistakes, predicting false road connections in the background. To address gaps in road connections, a VGG19-based TopologyAware perceptual loss was added to the training process for all the evaluated models. This helped recover more road pixels (raising IoU by about 0.10 for all models), but it did not lead to improvement in actual routing, with APLS decreasing by less than 0.05 on average. This shows that recovering missing road areas and fixing key connection gaps are separate challenges. Post-processing strategies were also tested improving connectivity but also creating false connections at times and deleting real ones. A multi-city domain generalization analysis also found a major drop in performance when models trained on structured environments were tested on new, high-density urban areas. For example, in the dense and unstructured city of Mumbai, pixel-level detection was moderate (IoU about 0.40), but graph connectivity failed completely (APLS less than 0.01). This highlights the serious impact of domain gaps caused by vertical obstructions and different spectral signatures, showing key limitations in current transfer learning methods and pointing to the need for future research in adaptive topological road extraction.
Autores principais:Malki, El Mehdi Gassa
Assunto:Artificial Neural Network Deep Learning Satellite Imagery Road Network Extraction Remote Sensing Semantic Segmentation Graph Topology
Ano:2026
País:Portugal
Tipo de documento:dissertação de mestrado
Tipo de acesso:acesso aberto
Instituição associada:Universidade Nova de Lisboa
Idioma:inglês
Origem:Repositório Institucional da UNL
_version_ 1868415511702798336
author Malki, El Mehdi Gassa
author_facet Malki, El Mehdi Gassa
author_role author
contributor_name_str_mv Baptista, Márcia Lourenço
Painho, Marco Octávio Trindade
Feitosa, Filipe
RUN
country_str PT
creators_json_txt [{\"Person.name\":\"Malki, El Mehdi Gassa\"}]
datacite.contributors.contributor.contributorName.fl_str_mv Baptista, Márcia Lourenço
Painho, Marco Octávio Trindade
Feitosa, Filipe
RUN
datacite.creators.creator.creatorName.fl_str_mv Malki, El Mehdi Gassa
datacite.date.Accepted.fl_str_mv 2026-02-26T00:00:00Z
datacite.date.available.fl_str_mv 2026-03-11T13:50:57Z
datacite.date.embargoed.fl_str_mv 2026-03-11T13:50:57Z
datacite.rights.fl_str_mv http://purl.org/coar/access_right/c_abf2
datacite.subjects.subject.fl_str_mv Artificial Neural Network
Deep Learning
Satellite Imagery
Road Network Extraction
Remote Sensing
Semantic Segmentation
Graph Topology
datacite.titles.title.fl_str_mv Road Network Detection and Route Travel Time Estimation from Satellite Imagery
dc.contributor.none.fl_str_mv Baptista, Márcia Lourenço
Painho, Marco Octávio Trindade
Feitosa, Filipe
RUN
dc.creator.none.fl_str_mv Malki, El Mehdi Gassa
dc.date.Accepted.fl_str_mv 2026-02-26T00:00:00Z
dc.date.available.fl_str_mv 2026-03-11T13:50:57Z
dc.date.embargoed.fl_str_mv 2026-03-11T13:50:57Z
dc.format.none.fl_str_mv application/pdf
dc.identifier.none.fl_str_mv http://hdl.handle.net/10362/201262
dc.language.none.fl_str_mv eng
dc.rights.cclincense.fl_str_mv http://creativecommons.org/licenses/by/4.0/
dc.rights.none.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.subject.none.fl_str_mv Artificial Neural Network
Deep Learning
Satellite Imagery
Road Network Extraction
Remote Sensing
Semantic Segmentation
Graph Topology
dc.title.fl_str_mv Road Network Detection and Route Travel Time Estimation from Satellite Imagery
dc.type.none.fl_str_mv http://purl.org/coar/resource_type/c_bdcc
description Accurately and quickly extracting road networks from high-resolution satellite images is essential for urban planning, disaster response, and autonomous navigation. Standard Convolutional Neural Networks (CNNs) are advantageous at performing semantic segmentation, but their focus on local features makes it hard for them to keep roads connected when there are obstacles like trees or building shadows. To address this, this thesis examines Hybrid Vision Transformers, focusing on the SegFormer (MiT-B3 encoder). SegFormer uses self-attention to capture global context and is a leading model in computer vision, though it has not been widely tested for mapping road networks. In this thesis this transformer model was compared to a state-of-the-art advanced CNN with dense dilated convolutions (DeepLabV3+ D3S2PP), which is designed for multi-scale context, and to the widely used ResNet50 U-Net, which serves as the main benchmark in this field for the chosen dataset. Most current evaluation methods use pixel-based metrics like Intersection over Union (IoU). However, IoU only measures how much area overlaps and does not account for road connectivity. For example, missing just one pixel can break a major road, making a route unusable, but this barely affects the IoU score. Since in GIS effective routing is a fundamental element, in this thesis a complete evaluation framework that goes beyond pixel accuracy is used. Graph-based metrics were used, including Average Path Length Similarity (APLS), the Weisfeiler-Lehman (WL) Kernel, to directly measure how well the model preserves road structure and connectivity. Also, a new width-based travel time metric was introduced to measure the real-world impact of topological mistakes. Experiments conducted across the SpaceNet 3 dataset reveal that the Hybrid Transformer achieves superior connectivity, significantly outperforming the ResNet baseline in structured cities like Las Vegas (APLS of 0.78 vs. 0.59). However, Transformers sometimes make confident mistakes, predicting false road connections in the background. To address gaps in road connections, a VGG19-based TopologyAware perceptual loss was added to the training process for all the evaluated models. This helped recover more road pixels (raising IoU by about 0.10 for all models), but it did not lead to improvement in actual routing, with APLS decreasing by less than 0.05 on average. This shows that recovering missing road areas and fixing key connection gaps are separate challenges. Post-processing strategies were also tested improving connectivity but also creating false connections at times and deleting real ones. A multi-city domain generalization analysis also found a major drop in performance when models trained on structured environments were tested on new, high-density urban areas. For example, in the dense and unstructured city of Mumbai, pixel-level detection was moderate (IoU about 0.40), but graph connectivity failed completely (APLS less than 0.01). This highlights the serious impact of domain gaps caused by vertical obstructions and different spectral signatures, showing key limitations in current transfer learning methods and pointing to the need for future research in adaptive topological road extraction.
dirty 0
eu_rights_str_mv openAccess
format masterThesis
fulltext.url.fl_str_mv https://run.unl.pt/bitstreams/2b4e9f66-e67b-4b1d-b56d-4dddeaf66c26/download
id run_59dfdab90b0b04fd896468ff83c8a516
identifier.url.fl_str_mv http://hdl.handle.net/10362/201262
instacron_str unl
institution Universidade Nova de Lisboa
instname_str Universidade Nova de Lisboa
language eng
network_acronym_str run
network_name_str Repositório Institucional da UNL
oai_identifier_str oai:run.unl.pt:10362/201262
organization_str_mv urn:organizationAcronym:unl
person_str_mv Malki, El Mehdi Gassa
publishDate 2026
reponame_str Repositório Institucional da UNL
repository_id_str urn:repositoryAcronym:run
service_str_mv urn:repositoryAcronym:run
spelling engengAccurately and quickly extracting road networks from high-resolution satellite images is essential for urban planning, disaster response, and autonomous navigation. Standard Convolutional Neural Networks (CNNs) are advantageous at performing semantic segmentation, but their focus on local features makes it hard for them to keep roads connected when there are obstacles like trees or building shadows. To address this, this thesis examines Hybrid Vision Transformers, focusing on the SegFormer (MiT-B3 encoder). SegFormer uses self-attention to capture global context and is a leading model in computer vision, though it has not been widely tested for mapping road networks. In this thesis this transformer model was compared to a state-of-the-art advanced CNN with dense dilated convolutions (DeepLabV3+ D3S2PP), which is designed for multi-scale context, and to the widely used ResNet50 U-Net, which serves as the main benchmark in this field for the chosen dataset. Most current evaluation methods use pixel-based metrics like Intersection over Union (IoU). However, IoU only measures how much area overlaps and does not account for road connectivity. For example, missing just one pixel can break a major road, making a route unusable, but this barely affects the IoU score. Since in GIS effective routing is a fundamental element, in this thesis a complete evaluation framework that goes beyond pixel accuracy is used. Graph-based metrics were used, including Average Path Length Similarity (APLS), the Weisfeiler-Lehman (WL) Kernel, to directly measure how well the model preserves road structure and connectivity. Also, a new width-based travel time metric was introduced to measure the real-world impact of topological mistakes. Experiments conducted across the SpaceNet 3 dataset reveal that the Hybrid Transformer achieves superior connectivity, significantly outperforming the ResNet baseline in structured cities like Las Vegas (APLS of 0.78 vs. 0.59). However, Transformers sometimes make confident mistakes, predicting false road connections in the background. To address gaps in road connections, a VGG19-based TopologyAware perceptual loss was added to the training process for all the evaluated models. This helped recover more road pixels (raising IoU by about 0.10 for all models), but it did not lead to improvement in actual routing, with APLS decreasing by less than 0.05 on average. This shows that recovering missing road areas and fixing key connection gaps are separate challenges. Post-processing strategies were also tested improving connectivity but also creating false connections at times and deleting real ones. A multi-city domain generalization analysis also found a major drop in performance when models trained on structured environments were tested on new, high-density urban areas. For example, in the dense and unstructured city of Mumbai, pixel-level detection was moderate (IoU about 0.40), but graph connectivity failed completely (APLS less than 0.01). This highlights the serious impact of domain gaps caused by vertical obstructions and different spectral signatures, showing key limitations in current transfer learning methods and pointing to the need for future research in adaptive topological road extraction.application/pdfengRoad Network Detection and Route Travel Time Estimation from Satellite ImageryMalki, El Mehdi GassaBaptista, Márcia LourençoPainho, Marco Octávio TrindadeFeitosa, FilipeHostingInstitutionOrganizationalRUNe-mailmailto:run@unl.ptrun@unl.ptURNurn:tid:2042322872026-03-11T13:50:57Z2026-02-262026-02-26T00:00:00ZHandlehttp://hdl.handle.net/10362/201262http://purl.org/coar/access_right/c_abf2open accessArtificial Neural NetworkDeep LearningSatellite ImageryRoad Network ExtractionRemote SensingSemantic SegmentationGraph Topology2416425 bytesliteraturehttp://purl.org/coar/resource_type/c_bdccmaster thesis2026-02-26http://creativecommons.org/licenses/by/4.0/http://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://run.unl.pt/bitstreams/2b4e9f66-e67b-4b1d-b56d-4dddeaf66c26/download
spellingShingle Road Network Detection and Route Travel Time Estimation from Satellite Imagery
Malki, El Mehdi Gassa
Artificial Neural Network
Deep Learning
Satellite Imagery
Road Network Extraction
Remote Sensing
Semantic Segmentation
Graph Topology
status SINGLETON
subject.fl_str_mv Artificial Neural Network
Deep Learning
Satellite Imagery
Road Network Extraction
Remote Sensing
Semantic Segmentation
Graph Topology
title Road Network Detection and Route Travel Time Estimation from Satellite Imagery
title_full Road Network Detection and Route Travel Time Estimation from Satellite Imagery
title_fullStr Road Network Detection and Route Travel Time Estimation from Satellite Imagery
title_full_unstemmed Road Network Detection and Route Travel Time Estimation from Satellite Imagery
title_short Road Network Detection and Route Travel Time Estimation from Satellite Imagery
title_sort Road Network Detection and Route Travel Time Estimation from Satellite Imagery
topic Artificial Neural Network
Deep Learning
Satellite Imagery
Road Network Extraction
Remote Sensing
Semantic Segmentation
Graph Topology
topic_facet Artificial Neural Network
Deep Learning
Satellite Imagery
Road Network Extraction
Remote Sensing
Semantic Segmentation
Graph Topology
url http://hdl.handle.net/10362/201262
visible 1