Publicação
Road Network Detection and Route Travel Time Estimation from Satellite Imagery
| Resumo: | Accurately and quickly extracting road networks from high-resolution satellite images is essential for urban planning, disaster response, and autonomous navigation. Standard Convolutional Neural Networks (CNNs) are advantageous at performing semantic segmentation, but their focus on local features makes it hard for them to keep roads connected when there are obstacles like trees or building shadows. To address this, this thesis examines Hybrid Vision Transformers, focusing on the SegFormer (MiT-B3 encoder). SegFormer uses self-attention to capture global context and is a leading model in computer vision, though it has not been widely tested for mapping road networks. In this thesis this transformer model was compared to a state-of-the-art advanced CNN with dense dilated convolutions (DeepLabV3+ D3S2PP), which is designed for multi-scale context, and to the widely used ResNet50 U-Net, which serves as the main benchmark in this field for the chosen dataset. Most current evaluation methods use pixel-based metrics like Intersection over Union (IoU). However, IoU only measures how much area overlaps and does not account for road connectivity. For example, missing just one pixel can break a major road, making a route unusable, but this barely affects the IoU score. Since in GIS effective routing is a fundamental element, in this thesis a complete evaluation framework that goes beyond pixel accuracy is used. Graph-based metrics were used, including Average Path Length Similarity (APLS), the Weisfeiler-Lehman (WL) Kernel, to directly measure how well the model preserves road structure and connectivity. Also, a new width-based travel time metric was introduced to measure the real-world impact of topological mistakes. Experiments conducted across the SpaceNet 3 dataset reveal that the Hybrid Transformer achieves superior connectivity, significantly outperforming the ResNet baseline in structured cities like Las Vegas (APLS of 0.78 vs. 0.59). However, Transformers sometimes make confident mistakes, predicting false road connections in the background. To address gaps in road connections, a VGG19-based TopologyAware perceptual loss was added to the training process for all the evaluated models. This helped recover more road pixels (raising IoU by about 0.10 for all models), but it did not lead to improvement in actual routing, with APLS decreasing by less than 0.05 on average. This shows that recovering missing road areas and fixing key connection gaps are separate challenges. Post-processing strategies were also tested improving connectivity but also creating false connections at times and deleting real ones. A multi-city domain generalization analysis also found a major drop in performance when models trained on structured environments were tested on new, high-density urban areas. For example, in the dense and unstructured city of Mumbai, pixel-level detection was moderate (IoU about 0.40), but graph connectivity failed completely (APLS less than 0.01). This highlights the serious impact of domain gaps caused by vertical obstructions and different spectral signatures, showing key limitations in current transfer learning methods and pointing to the need for future research in adaptive topological road extraction. |
|---|---|
| Autores principais: | Malki, El Mehdi Gassa |
| Assunto: | Artificial Neural Network Deep Learning Satellite Imagery Road Network Extraction Remote Sensing Semantic Segmentation Graph Topology |
| Ano: | 2026 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade Nova de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório Institucional da UNL |
| _version_ | 1868415511702798336 |
|---|---|
| author | Malki, El Mehdi Gassa |
| author_facet | Malki, El Mehdi Gassa |
| author_role | author |
| contributor_name_str_mv | Baptista, Márcia Lourenço Painho, Marco Octávio Trindade Feitosa, Filipe RUN |
| country_str | PT |
| creators_json_txt | [{\"Person.name\":\"Malki, El Mehdi Gassa\"}] |
| datacite.contributors.contributor.contributorName.fl_str_mv | Baptista, Márcia Lourenço Painho, Marco Octávio Trindade Feitosa, Filipe RUN |
| datacite.creators.creator.creatorName.fl_str_mv | Malki, El Mehdi Gassa |
| datacite.date.Accepted.fl_str_mv | 2026-02-26T00:00:00Z |
| datacite.date.available.fl_str_mv | 2026-03-11T13:50:57Z |
| datacite.date.embargoed.fl_str_mv | 2026-03-11T13:50:57Z |
| datacite.rights.fl_str_mv | http://purl.org/coar/access_right/c_abf2 |
| datacite.subjects.subject.fl_str_mv | Artificial Neural Network Deep Learning Satellite Imagery Road Network Extraction Remote Sensing Semantic Segmentation Graph Topology |
| datacite.titles.title.fl_str_mv | Road Network Detection and Route Travel Time Estimation from Satellite Imagery |
| dc.contributor.none.fl_str_mv | Baptista, Márcia Lourenço Painho, Marco Octávio Trindade Feitosa, Filipe RUN |
| dc.creator.none.fl_str_mv | Malki, El Mehdi Gassa |
| dc.date.Accepted.fl_str_mv | 2026-02-26T00:00:00Z |
| dc.date.available.fl_str_mv | 2026-03-11T13:50:57Z |
| dc.date.embargoed.fl_str_mv | 2026-03-11T13:50:57Z |
| dc.format.none.fl_str_mv | application/pdf |
| dc.identifier.none.fl_str_mv | http://hdl.handle.net/10362/201262 |
| dc.language.none.fl_str_mv | eng |
| dc.rights.cclincense.fl_str_mv | http://creativecommons.org/licenses/by/4.0/ |
| dc.rights.none.fl_str_mv | http://purl.org/coar/access_right/c_abf2 |
| dc.subject.none.fl_str_mv | Artificial Neural Network Deep Learning Satellite Imagery Road Network Extraction Remote Sensing Semantic Segmentation Graph Topology |
| dc.title.fl_str_mv | Road Network Detection and Route Travel Time Estimation from Satellite Imagery |
| dc.type.none.fl_str_mv | http://purl.org/coar/resource_type/c_bdcc |
| description | Accurately and quickly extracting road networks from high-resolution satellite images is essential for urban planning, disaster response, and autonomous navigation. Standard Convolutional Neural Networks (CNNs) are advantageous at performing semantic segmentation, but their focus on local features makes it hard for them to keep roads connected when there are obstacles like trees or building shadows. To address this, this thesis examines Hybrid Vision Transformers, focusing on the SegFormer (MiT-B3 encoder). SegFormer uses self-attention to capture global context and is a leading model in computer vision, though it has not been widely tested for mapping road networks. In this thesis this transformer model was compared to a state-of-the-art advanced CNN with dense dilated convolutions (DeepLabV3+ D3S2PP), which is designed for multi-scale context, and to the widely used ResNet50 U-Net, which serves as the main benchmark in this field for the chosen dataset. Most current evaluation methods use pixel-based metrics like Intersection over Union (IoU). However, IoU only measures how much area overlaps and does not account for road connectivity. For example, missing just one pixel can break a major road, making a route unusable, but this barely affects the IoU score. Since in GIS effective routing is a fundamental element, in this thesis a complete evaluation framework that goes beyond pixel accuracy is used. Graph-based metrics were used, including Average Path Length Similarity (APLS), the Weisfeiler-Lehman (WL) Kernel, to directly measure how well the model preserves road structure and connectivity. Also, a new width-based travel time metric was introduced to measure the real-world impact of topological mistakes. Experiments conducted across the SpaceNet 3 dataset reveal that the Hybrid Transformer achieves superior connectivity, significantly outperforming the ResNet baseline in structured cities like Las Vegas (APLS of 0.78 vs. 0.59). However, Transformers sometimes make confident mistakes, predicting false road connections in the background. To address gaps in road connections, a VGG19-based TopologyAware perceptual loss was added to the training process for all the evaluated models. This helped recover more road pixels (raising IoU by about 0.10 for all models), but it did not lead to improvement in actual routing, with APLS decreasing by less than 0.05 on average. This shows that recovering missing road areas and fixing key connection gaps are separate challenges. Post-processing strategies were also tested improving connectivity but also creating false connections at times and deleting real ones. A multi-city domain generalization analysis also found a major drop in performance when models trained on structured environments were tested on new, high-density urban areas. For example, in the dense and unstructured city of Mumbai, pixel-level detection was moderate (IoU about 0.40), but graph connectivity failed completely (APLS less than 0.01). This highlights the serious impact of domain gaps caused by vertical obstructions and different spectral signatures, showing key limitations in current transfer learning methods and pointing to the need for future research in adaptive topological road extraction. |
| dirty | 0 |
| eu_rights_str_mv | openAccess |
| format | masterThesis |
| fulltext.url.fl_str_mv | https://run.unl.pt/bitstreams/2b4e9f66-e67b-4b1d-b56d-4dddeaf66c26/download |
| id | run_59dfdab90b0b04fd896468ff83c8a516 |
| identifier.url.fl_str_mv | http://hdl.handle.net/10362/201262 |
| instacron_str | unl |
| institution | Universidade Nova de Lisboa |
| instname_str | Universidade Nova de Lisboa |
| language | eng |
| network_acronym_str | run |
| network_name_str | Repositório Institucional da UNL |
| oai_identifier_str | oai:run.unl.pt:10362/201262 |
| organization_str_mv | urn:organizationAcronym:unl |
| person_str_mv | Malki, El Mehdi Gassa |
| publishDate | 2026 |
| reponame_str | Repositório Institucional da UNL |
| repository_id_str | urn:repositoryAcronym:run |
| service_str_mv | urn:repositoryAcronym:run |
| spelling | engengAccurately and quickly extracting road networks from high-resolution satellite images is essential for urban planning, disaster response, and autonomous navigation. Standard Convolutional Neural Networks (CNNs) are advantageous at performing semantic segmentation, but their focus on local features makes it hard for them to keep roads connected when there are obstacles like trees or building shadows. To address this, this thesis examines Hybrid Vision Transformers, focusing on the SegFormer (MiT-B3 encoder). SegFormer uses self-attention to capture global context and is a leading model in computer vision, though it has not been widely tested for mapping road networks. In this thesis this transformer model was compared to a state-of-the-art advanced CNN with dense dilated convolutions (DeepLabV3+ D3S2PP), which is designed for multi-scale context, and to the widely used ResNet50 U-Net, which serves as the main benchmark in this field for the chosen dataset. Most current evaluation methods use pixel-based metrics like Intersection over Union (IoU). However, IoU only measures how much area overlaps and does not account for road connectivity. For example, missing just one pixel can break a major road, making a route unusable, but this barely affects the IoU score. Since in GIS effective routing is a fundamental element, in this thesis a complete evaluation framework that goes beyond pixel accuracy is used. Graph-based metrics were used, including Average Path Length Similarity (APLS), the Weisfeiler-Lehman (WL) Kernel, to directly measure how well the model preserves road structure and connectivity. Also, a new width-based travel time metric was introduced to measure the real-world impact of topological mistakes. Experiments conducted across the SpaceNet 3 dataset reveal that the Hybrid Transformer achieves superior connectivity, significantly outperforming the ResNet baseline in structured cities like Las Vegas (APLS of 0.78 vs. 0.59). However, Transformers sometimes make confident mistakes, predicting false road connections in the background. To address gaps in road connections, a VGG19-based TopologyAware perceptual loss was added to the training process for all the evaluated models. This helped recover more road pixels (raising IoU by about 0.10 for all models), but it did not lead to improvement in actual routing, with APLS decreasing by less than 0.05 on average. This shows that recovering missing road areas and fixing key connection gaps are separate challenges. Post-processing strategies were also tested improving connectivity but also creating false connections at times and deleting real ones. A multi-city domain generalization analysis also found a major drop in performance when models trained on structured environments were tested on new, high-density urban areas. For example, in the dense and unstructured city of Mumbai, pixel-level detection was moderate (IoU about 0.40), but graph connectivity failed completely (APLS less than 0.01). This highlights the serious impact of domain gaps caused by vertical obstructions and different spectral signatures, showing key limitations in current transfer learning methods and pointing to the need for future research in adaptive topological road extraction.application/pdfengRoad Network Detection and Route Travel Time Estimation from Satellite ImageryMalki, El Mehdi GassaBaptista, Márcia LourençoPainho, Marco Octávio TrindadeFeitosa, FilipeHostingInstitutionOrganizationalRUNe-mailmailto:run@unl.ptrun@unl.ptURNurn:tid:2042322872026-03-11T13:50:57Z2026-02-262026-02-26T00:00:00ZHandlehttp://hdl.handle.net/10362/201262http://purl.org/coar/access_right/c_abf2open accessArtificial Neural NetworkDeep LearningSatellite ImageryRoad Network ExtractionRemote SensingSemantic SegmentationGraph Topology2416425 bytesliteraturehttp://purl.org/coar/resource_type/c_bdccmaster thesis2026-02-26http://creativecommons.org/licenses/by/4.0/http://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://run.unl.pt/bitstreams/2b4e9f66-e67b-4b1d-b56d-4dddeaf66c26/download |
| spellingShingle | Road Network Detection and Route Travel Time Estimation from Satellite Imagery Malki, El Mehdi Gassa Artificial Neural Network Deep Learning Satellite Imagery Road Network Extraction Remote Sensing Semantic Segmentation Graph Topology |
| status | SINGLETON |
| subject.fl_str_mv | Artificial Neural Network Deep Learning Satellite Imagery Road Network Extraction Remote Sensing Semantic Segmentation Graph Topology |
| title | Road Network Detection and Route Travel Time Estimation from Satellite Imagery |
| title_full | Road Network Detection and Route Travel Time Estimation from Satellite Imagery |
| title_fullStr | Road Network Detection and Route Travel Time Estimation from Satellite Imagery |
| title_full_unstemmed | Road Network Detection and Route Travel Time Estimation from Satellite Imagery |
| title_short | Road Network Detection and Route Travel Time Estimation from Satellite Imagery |
| title_sort | Road Network Detection and Route Travel Time Estimation from Satellite Imagery |
| topic | Artificial Neural Network Deep Learning Satellite Imagery Road Network Extraction Remote Sensing Semantic Segmentation Graph Topology |
| topic_facet | Artificial Neural Network Deep Learning Satellite Imagery Road Network Extraction Remote Sensing Semantic Segmentation Graph Topology |
| url | http://hdl.handle.net/10362/201262 |
| visible | 1 |