Publicação
Training samples from open data for satellite imagery classification: Using K-means clustering algorithm
| Resumo: | To create a land use/land cover (LULC) map from a satellite image, we can follow a supervised classification approach if we know what classes exist in the study area and if we have representative training samples for each class. However, in heterogeneous biophysical environments, the wide range of spectral signatures among LULC classes can bias the classification results. In this study, we generated training samples from the official 2015 Portuguese Land Cover Map (COS). In spite of the viability of this source of information (official reference data), we faced some problems with corrupted data and an unbalanced number of training samples per class. As such, we explored the K-means clustering technique in order to understand whether the data had critical issues and to select the most representative training samples by class for satellite imagery classification. We investigated the potential of this technique for LULC classification in a predominantly rural region characterized by a mixed agro-silvo-pastoral environment, which means there is a broad range of spectral signatures for each LULC class. Two image classifications for 2015 were performed using the random forest classifier. The first was done by using the most representative training samples selected from the statistical analysis, and the other was done by using the full generated training set (original training set). Ultimately, the present study demonstrates the improvements in overall accuracy between both image classifications (+8%), showing that the applied methodology has a positive impact on the results. |
|---|---|
| Autores principais: | Viana, Cláudia M. |
| Outros Autores: | Girão, Inês; Rocha, Jorge |
| Assunto: | Land use/land cover Training set Clustering Landsat Classification Random forest |
| Ano: | 2019 |
| País: | Portugal |
| Tipo de documento: | capítulo de livro |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório da Universidade de Lisboa |
| Resumo: | To create a land use/land cover (LULC) map from a satellite image, we can follow a supervised classification approach if we know what classes exist in the study area and if we have representative training samples for each class. However, in heterogeneous biophysical environments, the wide range of spectral signatures among LULC classes can bias the classification results. In this study, we generated training samples from the official 2015 Portuguese Land Cover Map (COS). In spite of the viability of this source of information (official reference data), we faced some problems with corrupted data and an unbalanced number of training samples per class. As such, we explored the K-means clustering technique in order to understand whether the data had critical issues and to select the most representative training samples by class for satellite imagery classification. We investigated the potential of this technique for LULC classification in a predominantly rural region characterized by a mixed agro-silvo-pastoral environment, which means there is a broad range of spectral signatures for each LULC class. Two image classifications for 2015 were performed using the random forest classifier. The first was done by using the most representative training samples selected from the statistical analysis, and the other was done by using the full generated training set (original training set). Ultimately, the present study demonstrates the improvements in overall accuracy between both image classifications (+8%), showing that the applied methodology has a positive impact on the results. |
|---|