Publicação

NLPyPort: Named Entity Recognition with CRF and Rule-Based Relation Extraction

Detalhes bibliográficos
Resumo:	This paper describes the application of the NLPyPort pipeline to Named Entity Recognition (NER) and Relation Extraction in Portuguese, more precisely in the scope of the IberLEF-2019 evaluation task on the topic. NER was tackled with CRF, based on several features, and trained in the HAREM collection, but results were low. This was partly caused by an issue on the submitted model, which had been trained in lowercase text, but, apparently, also due to the training data used, which highlights the different natures of HAREM, the source of the majority of the testing corpus, and SIGARRA. Relations were extracted with a set of rules bootstrapped from the examples provided by the organisation. Despite an F1-score of 0.72, we were the only participants in this task. We also express our doubts concerning the utility of the extracted relations.
Autores principais:	Ferreira, João
Outros Autores:	Oliveira, Hugo Gonçalo; Rodrigues, Ricardo
Assunto:	NLP NER CRF Relation Extraction PoS Tagging Pattern Based
Ano:	2019
País:	Portugal
Tipo de documento:	documento de conferência
Tipo de acesso:	acesso aberto
Instituição associada:	Instituto Politécnico de Coimbra
Idioma:	inglês
Origem:	Instituto Politécnico de Coimbra

Descrição
Resumo:	This paper describes the application of the NLPyPort pipeline to Named Entity Recognition (NER) and Relation Extraction in Portuguese, more precisely in the scope of the IberLEF-2019 evaluation task on the topic. NER was tackled with CRF, based on several features, and trained in the HAREM collection, but results were low. This was partly caused by an issue on the submitted model, which had been trained in lowercase text, but, apparently, also due to the training data used, which highlights the different natures of HAREM, the source of the majority of the testing corpus, and SIGARRA. Relations were extracted with a set of rules bootstrapped from the examples provided by the organisation. Despite an F1-score of 0.72, we were the only participants in this task. We also express our doubts concerning the utility of the extracted relations.