Publicação

NLPyPort: Named Entity Recognition with CRF and Rule-Based Relation Extraction

Ver documento

Detalhes bibliográficos
Resumo:This paper describes the application of the NLPyPort pipeline to Named Entity Recognition (NER) and Relation Extraction in Portuguese, more precisely in the scope of the IberLEF-2019 evaluation task on the topic. NER was tackled with CRF, based on several features, and trained in the HAREM collection, but results were low. This was partly caused by an issue on the submitted model, which had been trained in lowercase text, but, apparently, also due to the training data used, which highlights the different natures of HAREM, the source of the majority of the testing corpus, and SIGARRA. Relations were extracted with a set of rules bootstrapped from the examples provided by the organisation. Despite an F1-score of 0.72, we were the only participants in this task. We also express our doubts concerning the utility of the extracted relations.
Autores principais:Ferreira, João
Outros Autores:Oliveira, Hugo Gonçalo; Rodrigues, Ricardo
Assunto:NLP NER CRF Relation Extraction PoS Tagging Pattern Based
Ano:2019
País:Portugal
Tipo de documento:documento de conferência
Tipo de acesso:acesso aberto
Instituição associada:Instituto Politécnico de Coimbra
Idioma:inglês
Origem:Instituto Politécnico de Coimbra
_version_ 1868801021878206464
author Ferreira, João
author2 Oliveira, Hugo Gonçalo
Rodrigues, Ricardo
author2_role author
author
author_facet Ferreira, João
Oliveira, Hugo Gonçalo
Rodrigues, Ricardo
author_role author
contributor_name_str_mv Repositório Comum
country_str PT
creators_json_txt [{\"Person.name\":\"Ferreira, João\"},{\"Person.name\":\"Oliveira, Hugo Gonçalo\"},{\"Person.name\":\"Rodrigues, Ricardo\",\"Person.identifier.orcid\":\"0000-0002-6262-7920\"}]
datacite.contributors.contributor.contributorName.fl_str_mv Repositório Comum
datacite.creators.creator.creatorName.fl_str_mv Ferreira, João
Oliveira, Hugo Gonçalo
Rodrigues, Ricardo
datacite.date.Accepted.fl_str_mv 2019-01-01T00:00:00Z
datacite.date.available.fl_str_mv 2026-01-26T17:06:12Z
datacite.date.embargoed.fl_str_mv 2026-01-26T17:06:12Z
datacite.rights.fl_str_mv http://purl.org/coar/access_right/c_abf2
datacite.subjects.subject.fl_str_mv NLP
NER
CRF
Relation Extraction
PoS Tagging
Pattern Based
datacite.titles.title.fl_str_mv NLPyPort: Named Entity Recognition with CRF and Rule-Based Relation Extraction
dc.contributor.none.fl_str_mv Repositório Comum
dc.creator.none.fl_str_mv Ferreira, João
Oliveira, Hugo Gonçalo
Rodrigues, Ricardo
dc.date.Accepted.fl_str_mv 2019-01-01T00:00:00Z
dc.date.available.fl_str_mv 2026-01-26T17:06:12Z
dc.date.embargoed.fl_str_mv 2026-01-26T17:06:12Z
dc.format.none.fl_str_mv application/pdf
dc.identifier.none.fl_str_mv http://hdl.handle.net/10400.26/61202
dc.language.none.fl_str_mv eng
dc.rights.cclincense.fl_str_mv http://creativecommons.org/licenses/by/4.0/
dc.rights.none.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.subject.none.fl_str_mv NLP
NER
CRF
Relation Extraction
PoS Tagging
Pattern Based
dc.title.fl_str_mv NLPyPort: Named Entity Recognition with CRF and Rule-Based Relation Extraction
dc.type.none.fl_str_mv http://purl.org/coar/resource_type/c_c94f
description This paper describes the application of the NLPyPort pipeline to Named Entity Recognition (NER) and Relation Extraction in Portuguese, more precisely in the scope of the IberLEF-2019 evaluation task on the topic. NER was tackled with CRF, based on several features, and trained in the HAREM collection, but results were low. This was partly caused by an issue on the submitted model, which had been trained in lowercase text, but, apparently, also due to the training data used, which highlights the different natures of HAREM, the source of the majority of the testing corpus, and SIGARRA. Relations were extracted with a set of rules bootstrapped from the examples provided by the organisation. Despite an F1-score of 0.72, we were the only participants in this task. We also express our doubts concerning the utility of the extracted relations.
dirty 0
eu_rights_str_mv openAccess
format conferenceObject
fulltext.url.fl_str_mv https://comum.rcaap.pt/bitstreams/c80337e9-bd96-4463-892e-8d61fa8929c3/download
id ipc_50e2abc2f09c3cd12673023e05539d64
identifier.url.fl_str_mv http://hdl.handle.net/10400.26/61202
instacron_str ipc
institution Instituto Politécnico de Coimbra
instname_str Instituto Politécnico de Coimbra
language eng
network_acronym_str ipc
network_name_str Instituto Politécnico de Coimbra
oai_identifier_str oai:comum.rcaap.pt:10400.26/61202
organization_str_mv urn:organizationAcronym:ipc
person_str_mv Ferreira, João
Oliveira, Hugo Gonçalo
Rodrigues, Ricardo
Rodrigues, Ricardo
https://www.ciencia-id.pt/D31C-FB4A-FEAA
D31C-FB4A-FEAA
http://orcid.org/0000-0002-6262-7920
0000-0002-6262-7920
publishDate 2019
reponame_str Instituto Politécnico de Coimbra
repository_id_str urn:repositoryAcronym:ipc
service_str_mv urn:repositoryAcronym:ipc
spelling engengThis paper describes the application of the NLPyPort pipeline to Named Entity Recognition (NER) and Relation Extraction in Portuguese, more precisely in the scope of the IberLEF-2019 evaluation task on the topic. NER was tackled with CRF, based on several features, and trained in the HAREM collection, but results were low. This was partly caused by an issue on the submitted model, which had been trained in lowercase text, but, apparently, also due to the training data used, which highlights the different natures of HAREM, the source of the majority of the testing corpus, and SIGARRA. Relations were extracted with a set of rules bootstrapped from the examples provided by the organisation. Despite an F1-score of 0.72, we were the only participants in this task. We also express our doubts concerning the utility of the extracted relations.application/pdfengNLPyPort: Named Entity Recognition with CRF and Rule-Based Relation ExtractionFerreira, JoãoOliveira, Hugo GonçaloPersonalRodrigues, RicardoDSpacehttp://dspace.org/items/c64ccf7c-eca2-43cf-a4a2-78e684499c00DSpacehttp://dspace.org/items/c64ccf7c-eca2-43cf-a4a2-78e684499c00RodriguesRicardoCiência IDhttps://www.ciencia-id.ptD31C-FB4A-FEAAORCIDhttp://orcid.org0000-0002-6262-7920HostingInstitutionOrganizationalRepositório Comume-mailmailto:comum@rcaap.ptcomum@rcaap.pt2026-01-26T17:06:12Z20192019-01-01T00:00:00ZHandlehttp://hdl.handle.net/10400.26/61202http://purl.org/coar/access_right/c_abf2open accessNLPNERCRFRelation ExtractionPoS TaggingPattern Based888306 bytesother research producthttp://purl.org/coar/resource_type/c_c94fconference object2019http://creativecommons.org/licenses/by/4.0/http://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://comum.rcaap.pt/bitstreams/c80337e9-bd96-4463-892e-8d61fa8929c3/downloadProceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)4684762019
spellingShingle NLPyPort: Named Entity Recognition with CRF and Rule-Based Relation Extraction
Ferreira, João
NLP
NER
CRF
Relation Extraction
PoS Tagging
Pattern Based
status SINGLETON
subject.fl_str_mv NLP
NER
CRF
Relation Extraction
PoS Tagging
Pattern Based
title NLPyPort: Named Entity Recognition with CRF and Rule-Based Relation Extraction
title_full NLPyPort: Named Entity Recognition with CRF and Rule-Based Relation Extraction
title_fullStr NLPyPort: Named Entity Recognition with CRF and Rule-Based Relation Extraction
title_full_unstemmed NLPyPort: Named Entity Recognition with CRF and Rule-Based Relation Extraction
title_short NLPyPort: Named Entity Recognition with CRF and Rule-Based Relation Extraction
title_sort NLPyPort: Named Entity Recognition with CRF and Rule-Based Relation Extraction
topic NLP
NER
CRF
Relation Extraction
PoS Tagging
Pattern Based
topic_facet NLP
NER
CRF
Relation Extraction
PoS Tagging
Pattern Based
url http://hdl.handle.net/10400.26/61202
visible 1