Document details

Contributions for the improvement of specific class mapping

Author(s): Silva, Joel Dinis Baptista Ferreira da

Date: 2017

Persistent ID: http://hdl.handle.net/10362/33031

Origin: Repositório Institucional da UNL

Project/scholarship: info:eu-repo/grantAgreement/FCT/SFRH/SFRH%2FBD%2F84444%2F2012/PT;

Subject(s): Remote sensing; land cover mapping; specific class mapping; cost-sensitive learning; semi-supervised learning; single-class learning


Description

A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information Management, specialization in Geographic Information Systems

The analysis of remotely sensed imagery has become a fundamental task for many environmental centred activities, not just scientific but also management related. In particular, the use of land cover maps depicting a particular study site is an integral part of many research projects, as they are not just a fundamental variable in environmental models but also base information supporting policy decisions. Land cover mapping assisted by supervised classification is today a staple tool of any analyst processing remotely sensed data, insomuch as these techniques allow users to map entire sites of interest in a omprehensive way. Many remote sensing projects are usually interested in a small number of land cover classes present in a study area and not in all classes that make-up the landscape. When focus is on a particular sub-set of classes of interest, conventional supervised classification may be sub-optimal for the discrimination of these specific target classes. The process of producing a non-exhaustive map, that is depicting only the classes of interest for the user, is called specific class mapping. This is the topic of this dissertation. Here, specific class mapping is examined to understand its origins, developments, adoption and current limitations. The main research goal is then to contribute for the understanding and improvement of this topic, while presenting its main constrains in a clear way and proposing enhanced methods at the reach of the non-expert user. In detail, this study starts by analysing the definition of specific class mapping and why the conventional multi-class supervised classification process may yield sub-optimal outcomes. Attention then is turn to the previous works that have tackled this problem. From here a synthesis is made, categorising and characterising previous methodologies. Its then learnt that the methodologies tackling specific class mapping fall under two broad categories, the binarisation approaches and the singe-class approaches, and that both types are not without problems. This is the starting point of the development component of this dissertation that branches out in three research lines. First, cost-sensitive learning is utilised to improve specific class mapping. In previous studies it was shown that it may be susceptible to data imbalance problems present in the training data set, since the classes of interest are often a small part of the training set. As a result the classification may be biased towards the largest classes and, thus, be sub-optimal for the discrimination of the classes of interest. Here cost-sensitive learning is used to balance the training data set to minimise the effects of data imbalance. In this approach errors committed in the minority class are treated as being costlier than errors committed in the majority class. Cost-sensitive approaches are typically implemented by weighting training data points accordingly to their importance to the analysis. By shifting the weight of the data set from the majority class to the minority class, the user is capable to inform the learning process that training data points in the minority class are as critical as the points in the majority class. The results of this study indicate that this simple approach is capable to improve the process of specific class mapping by increasing the accuracy to which the classes of interest are discriminated. Second, the combined use single-class classifiers for specific class mapping is explored. Supervised algorithms for single-class classification are particularly attractive due to its reduced training requirements. Unlike other methods where all classes present in the study site regardless of its relevance for the particular objective to the users, single-class classifiers rely exclusively on the training of the class of interest. However, these methods can only solve specific classification problems with one class of interest. If more classes are important, those methods cannot be directly utilised. Here is proposed three combining methodologies to combine single-class classifiers to map subsets of land cover classes. The results indicate that an intelligent combination of single-class classifiers can be used to achieve accurate results, statistically noninferior to the standard multi-class classification, without the need of an exhaustive training set, saving resources that can be allocated to other steps of the data analysis process. Third, the combined use of cost-sensitive and semi-supervised learning to improve specific class mapping is explored. A limitation of the specific class binary approaches is that they still require training data from secondary classes, and that may be costly. On the other hand, a limitation of the specific class single-class approaches is that, while requiring only training data from the specific classes of interest, this method tend to overestimate the extension of the classes of interest. This is because the classifier is trained without information about the negative part of the classification space. A way to overcome this is with semi-supervised learning, where the data points for the negative class are randomly sampled from the classification space. However that may include false negatives. To overcome this difficult, cost-sensitive learning is utilised to mitigate the effect of these potentially misclassified data points. Cost weights were here defined using an exponential model that assign more weight to the negative data points that are more likely to be correctly labelled and less to the points that are more likely to be mislabelled. The results show that accuracy achieved with the proposed method is statistically non-inferior to that achieved with standard binary classification requiring however much less training effort.

Document Type Doctoral thesis
Language English
Advisor(s) Caetano, Mário Sílvio Rochinha de Andrade; Bação, Fernando José Ferreira Lucas
Contributor(s) RUN
CC Licence
facebook logo  linkedin logo  twitter logo 
mendeley logo