Publicação

SAQL: query language for corpora with morpho-syntactic annotation

Ver documento

Detalhes bibliográficos
Resumo:Computer Mediated Communication becomes more prevalent with each passing day, be it in social media, blogs or forums. These mediums gather large amounts of people from different backgrounds and provide places where opposing ideals can clash. This can devolve into attacks, resorting to inappropriate language and, in more extreme cases, hate speech. The detection of these cases is a problem, due to the large amount of data posted online and due to the language itself. The various idiosyncrasies of language restrict the automatic classification efforts. The aim of this thesis was to develop a system capable of processing texts, identifying and annotating within them certain syntactic patterns typically present in hate speech. This main purpose can be split in two different goals: morpho-syntactic annotation of online texts, creating a query engine to search for patterns present in the corpus; and identify and classify the occurrence of hate speech in an online medium. As a case study, the corpus extracted from online platforms by the NetLang Project was used. To fulfill these goals, a pre-processing system was implemented, the resulting annotations feeding both the classification system and the query system. The hate speech classification system was developed adopting a mixed methodology, employing manual linguistic analysis to the results arising out of the automatic methods in order to classify instances of hate speech. The system was tested and the results were compared with the statistical classification. The query system consisted in the formulation of the query language and the creation of the respective query engine which allows to search the annotated corpus for particular sequences in the texts. To evaluate the usability of the query engine, an experiment was carried out, gathering feedback from possible final users.
Autores principais:Pereira, Ana Filipa Vilela
Assunto:Computer mediated communication Hate Speech Classification Morpho-syntactic annotation Natural language processing Classificação de discurso de ódio Comunicação mediada por computador Etiquetação morfossintática Processamento de linguagem natural
Ano:2022
País:Portugal
Tipo de documento:dissertação de mestrado
Tipo de acesso:acesso aberto
Instituição associada:Universidade do Minho
Idioma:espanhol
Origem:RepositóriUM - Universidade do Minho
_version_ 1866875763371802624
author Pereira, Ana Filipa Vilela
author_facet Pereira, Ana Filipa Vilela
author_role author
contributor_name_str_mv Henriques, Pedro Rangel
Araújo, Cristiana
Universidade do Minho
country_str PT
creators_json_txt [{\"Person.name\":\"Pereira, Ana Filipa Vilela\"}]
datacite.contributors.contributor.contributorName.fl_str_mv Henriques, Pedro Rangel
Araújo, Cristiana
Universidade do Minho
datacite.creators.creator.creatorName.fl_str_mv Pereira, Ana Filipa Vilela
datacite.date.Accepted.fl_str_mv 2022-04-05T00:00:00Z
datacite.date.available.fl_str_mv 2022-06-07T08:55:30Z
datacite.date.embargoed.fl_str_mv 2022-06-07T08:55:30Z
datacite.rights.fl_str_mv http://purl.org/coar/access_right/c_abf2
datacite.subjects.subject.fl_str_mv Computer mediated communication
Hate Speech Classification
Morpho-syntactic annotation
Natural language processing
Classificação de discurso de ódio
Comunicação mediada por computador
Etiquetação morfossintática
Processamento de linguagem natural
datacite.titles.title.fl_str_mv SAQL: query language for corpora with morpho-syntactic annotation
SAQL: linguagem de interrogação para corpora com anotação morfossintática
dc.contributor.none.fl_str_mv Henriques, Pedro Rangel
Araújo, Cristiana
Universidade do Minho
dc.creator.none.fl_str_mv Pereira, Ana Filipa Vilela
dc.date.Accepted.fl_str_mv 2022-04-05T00:00:00Z
dc.date.available.fl_str_mv 2022-06-07T08:55:30Z
dc.date.embargoed.fl_str_mv 2022-06-07T08:55:30Z
dc.format.none.fl_str_mv application/pdf
dc.identifier.none.fl_str_mv https://hdl.handle.net/1822/78258
dc.language.none.fl_str_mv spa
dc.rights.cclincense.fl_str_mv http://creativecommons.org/licenses/by-nc/4.0/
dc.rights.none.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.rights.rights.copyright.fl_str_mv openAccess
dc.subject.none.fl_str_mv Computer mediated communication
Hate Speech Classification
Morpho-syntactic annotation
Natural language processing
Classificação de discurso de ódio
Comunicação mediada por computador
Etiquetação morfossintática
Processamento de linguagem natural
dc.title.fl_str_mv SAQL: query language for corpora with morpho-syntactic annotation
SAQL: linguagem de interrogação para corpora com anotação morfossintática
dc.type.none.fl_str_mv http://purl.org/coar/resource_type/c_bdcc
description Computer Mediated Communication becomes more prevalent with each passing day, be it in social media, blogs or forums. These mediums gather large amounts of people from different backgrounds and provide places where opposing ideals can clash. This can devolve into attacks, resorting to inappropriate language and, in more extreme cases, hate speech. The detection of these cases is a problem, due to the large amount of data posted online and due to the language itself. The various idiosyncrasies of language restrict the automatic classification efforts. The aim of this thesis was to develop a system capable of processing texts, identifying and annotating within them certain syntactic patterns typically present in hate speech. This main purpose can be split in two different goals: morpho-syntactic annotation of online texts, creating a query engine to search for patterns present in the corpus; and identify and classify the occurrence of hate speech in an online medium. As a case study, the corpus extracted from online platforms by the NetLang Project was used. To fulfill these goals, a pre-processing system was implemented, the resulting annotations feeding both the classification system and the query system. The hate speech classification system was developed adopting a mixed methodology, employing manual linguistic analysis to the results arising out of the automatic methods in order to classify instances of hate speech. The system was tested and the results were compared with the statistical classification. The query system consisted in the formulation of the query language and the creation of the respective query engine which allows to search the annotated corpus for particular sequences in the texts. To evaluate the usability of the query engine, an experiment was carried out, gathering feedback from possible final users.
dirty 0
eu_rights_str_mv openAccess
format masterThesis
fulltext.url.fl_str_mv https://prod-dspace.uminho.pt/bitstreams/a5cd57f4-1754-4c34-9d64-e02a1508c161/download
id rum_5aee3f8dcf8a7a12b97345bf082edd89
identifier.url.fl_str_mv https://hdl.handle.net/1822/78258
instacron_str repositorium
institution Universidade do Minho
instname_str Universidade do Minho
language spa
network_acronym_str rum
network_name_str RepositóriUM - Universidade do Minho
oai_identifier_str oai:repositorium.uminho.pt:1822/78258
organization_str_mv urn:organizationAcronym:repositorium
person_str_mv Pereira, Ana Filipa Vilela
publishDate 2022
reponame_str RepositóriUM - Universidade do Minho
repository_id_str urn:repositoryAcronym:rum
service_str_mv urn:repositoryAcronym:rum
spelling spaporComputer Mediated Communication becomes more prevalent with each passing day, be it in social media, blogs or forums. These mediums gather large amounts of people from different backgrounds and provide places where opposing ideals can clash. This can devolve into attacks, resorting to inappropriate language and, in more extreme cases, hate speech. The detection of these cases is a problem, due to the large amount of data posted online and due to the language itself. The various idiosyncrasies of language restrict the automatic classification efforts. The aim of this thesis was to develop a system capable of processing texts, identifying and annotating within them certain syntactic patterns typically present in hate speech. This main purpose can be split in two different goals: morpho-syntactic annotation of online texts, creating a query engine to search for patterns present in the corpus; and identify and classify the occurrence of hate speech in an online medium. As a case study, the corpus extracted from online platforms by the NetLang Project was used. To fulfill these goals, a pre-processing system was implemented, the resulting annotations feeding both the classification system and the query system. The hate speech classification system was developed adopting a mixed methodology, employing manual linguistic analysis to the results arising out of the automatic methods in order to classify instances of hate speech. The system was tested and the results were compared with the statistical classification. The query system consisted in the formulation of the query language and the creation of the respective query engine which allows to search the annotated corpus for particular sequences in the texts. To evaluate the usability of the query engine, an experiment was carried out, gathering feedback from possible final users.application/pdfporSAQL: query language for corpora with morpho-syntactic annotationAlternativeTitleporSAQL: linguagem de interrogação para corpora com anotação morfossintáticaPereira, Ana Filipa VilelaHenriques, Pedro RangelAraújo, CristianaHostingInstitutionOrganizationalUniversidade do Minhoe-mailmailto:repositorium@usdb.uminho.ptrepositorium@usdb.uminho.ptURNurn:tid:2029958792022-06-07T08:55:30Z2022-04-052022-012022-04-05T00:00:00ZHandlehttps://hdl.handle.net/1822/78258http://purl.org/coar/access_right/c_abf2open accessComputer mediated communicationHate Speech ClassificationMorpho-syntactic annotationNatural language processingClassificação de discurso de ódioComunicação mediada por computadorEtiquetação morfossintáticaProcessamento de linguagem natural1542288 bytesliteraturehttp://purl.org/coar/resource_type/c_bdccmaster thesis2022-04-05http://creativecommons.org/licenses/by-nc/4.0/openAccesshttp://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://prod-dspace.uminho.pt/bitstreams/a5cd57f4-1754-4c34-9d64-e02a1508c161/download
spellingShingle SAQL: query language for corpora with morpho-syntactic annotation
Pereira, Ana Filipa Vilela
Computer mediated communication
Hate Speech Classification
Morpho-syntactic annotation
Natural language processing
Classificação de discurso de ódio
Comunicação mediada por computador
Etiquetação morfossintática
Processamento de linguagem natural
status SINGLETON
subject.fl_str_mv Computer mediated communication
Hate Speech Classification
Morpho-syntactic annotation
Natural language processing
Classificação de discurso de ódio
Comunicação mediada por computador
Etiquetação morfossintática
Processamento de linguagem natural
title SAQL: query language for corpora with morpho-syntactic annotation
title_full SAQL: query language for corpora with morpho-syntactic annotation
title_fullStr SAQL: query language for corpora with morpho-syntactic annotation
title_full_unstemmed SAQL: query language for corpora with morpho-syntactic annotation
title_short SAQL: query language for corpora with morpho-syntactic annotation
title_sort SAQL: query language for corpora with morpho-syntactic annotation
topic Computer mediated communication
Hate Speech Classification
Morpho-syntactic annotation
Natural language processing
Classificação de discurso de ódio
Comunicação mediada por computador
Etiquetação morfossintática
Processamento de linguagem natural
topic_facet Computer mediated communication
Hate Speech Classification
Morpho-syntactic annotation
Natural language processing
Classificação de discurso de ódio
Comunicação mediada por computador
Etiquetação morfossintática
Processamento de linguagem natural
url https://hdl.handle.net/1822/78258
visible 1