Publicação
NLPyPort: Named Entity Recognition with CRF and Rule-Based Relation Extraction
| Resumo: | This paper describes the application of the NLPyPort pipeline to Named Entity Recognition (NER) and Relation Extraction in Portuguese, more precisely in the scope of the IberLEF-2019 evaluation task on the topic. NER was tackled with CRF, based on several features, and trained in the HAREM collection, but results were low. This was partly caused by an issue on the submitted model, which had been trained in lowercase text, but, apparently, also due to the training data used, which highlights the different natures of HAREM, the source of the majority of the testing corpus, and SIGARRA. Relations were extracted with a set of rules bootstrapped from the examples provided by the organisation. Despite an F1-score of 0.72, we were the only participants in this task. We also express our doubts concerning the utility of the extracted relations. |
|---|---|
| Autores principais: | Ferreira, João |
| Outros Autores: | Oliveira, Hugo Gonçalo; Rodrigues, Ricardo |
| Assunto: | NLP NER CRF Relation Extraction PoS Tagging Pattern Based |
| Ano: | 2019 |
| País: | Portugal |
| Tipo de documento: | documento de conferência |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Instituto Politécnico de Coimbra |
| Idioma: | inglês |
| Origem: | Instituto Politécnico de Coimbra |
| _version_ | 1868801021878206464 |
|---|---|
| author | Ferreira, João |
| author2 | Oliveira, Hugo Gonçalo Rodrigues, Ricardo |
| author2_role | author author |
| author_facet | Ferreira, João Oliveira, Hugo Gonçalo Rodrigues, Ricardo |
| author_role | author |
| contributor_name_str_mv | Repositório Comum |
| country_str | PT |
| creators_json_txt | [{\"Person.name\":\"Ferreira, João\"},{\"Person.name\":\"Oliveira, Hugo Gonçalo\"},{\"Person.name\":\"Rodrigues, Ricardo\",\"Person.identifier.orcid\":\"0000-0002-6262-7920\"}] |
| datacite.contributors.contributor.contributorName.fl_str_mv | Repositório Comum |
| datacite.creators.creator.creatorName.fl_str_mv | Ferreira, João Oliveira, Hugo Gonçalo Rodrigues, Ricardo |
| datacite.date.Accepted.fl_str_mv | 2019-01-01T00:00:00Z |
| datacite.date.available.fl_str_mv | 2026-01-26T17:06:12Z |
| datacite.date.embargoed.fl_str_mv | 2026-01-26T17:06:12Z |
| datacite.rights.fl_str_mv | http://purl.org/coar/access_right/c_abf2 |
| datacite.subjects.subject.fl_str_mv | NLP NER CRF Relation Extraction PoS Tagging Pattern Based |
| datacite.titles.title.fl_str_mv | NLPyPort: Named Entity Recognition with CRF and Rule-Based Relation Extraction |
| dc.contributor.none.fl_str_mv | Repositório Comum |
| dc.creator.none.fl_str_mv | Ferreira, João Oliveira, Hugo Gonçalo Rodrigues, Ricardo |
| dc.date.Accepted.fl_str_mv | 2019-01-01T00:00:00Z |
| dc.date.available.fl_str_mv | 2026-01-26T17:06:12Z |
| dc.date.embargoed.fl_str_mv | 2026-01-26T17:06:12Z |
| dc.format.none.fl_str_mv | application/pdf |
| dc.identifier.none.fl_str_mv | http://hdl.handle.net/10400.26/61202 |
| dc.language.none.fl_str_mv | eng |
| dc.rights.cclincense.fl_str_mv | http://creativecommons.org/licenses/by/4.0/ |
| dc.rights.none.fl_str_mv | http://purl.org/coar/access_right/c_abf2 |
| dc.subject.none.fl_str_mv | NLP NER CRF Relation Extraction PoS Tagging Pattern Based |
| dc.title.fl_str_mv | NLPyPort: Named Entity Recognition with CRF and Rule-Based Relation Extraction |
| dc.type.none.fl_str_mv | http://purl.org/coar/resource_type/c_c94f |
| description | This paper describes the application of the NLPyPort pipeline to Named Entity Recognition (NER) and Relation Extraction in Portuguese, more precisely in the scope of the IberLEF-2019 evaluation task on the topic. NER was tackled with CRF, based on several features, and trained in the HAREM collection, but results were low. This was partly caused by an issue on the submitted model, which had been trained in lowercase text, but, apparently, also due to the training data used, which highlights the different natures of HAREM, the source of the majority of the testing corpus, and SIGARRA. Relations were extracted with a set of rules bootstrapped from the examples provided by the organisation. Despite an F1-score of 0.72, we were the only participants in this task. We also express our doubts concerning the utility of the extracted relations. |
| dirty | 0 |
| eu_rights_str_mv | openAccess |
| format | conferenceObject |
| fulltext.url.fl_str_mv | https://comum.rcaap.pt/bitstreams/c80337e9-bd96-4463-892e-8d61fa8929c3/download |
| id | ipc_50e2abc2f09c3cd12673023e05539d64 |
| identifier.url.fl_str_mv | http://hdl.handle.net/10400.26/61202 |
| instacron_str | ipc |
| institution | Instituto Politécnico de Coimbra |
| instname_str | Instituto Politécnico de Coimbra |
| language | eng |
| network_acronym_str | ipc |
| network_name_str | Instituto Politécnico de Coimbra |
| oai_identifier_str | oai:comum.rcaap.pt:10400.26/61202 |
| organization_str_mv | urn:organizationAcronym:ipc |
| person_str_mv | Ferreira, João Oliveira, Hugo Gonçalo Rodrigues, Ricardo Rodrigues, Ricardo https://www.ciencia-id.pt/D31C-FB4A-FEAA D31C-FB4A-FEAA http://orcid.org/0000-0002-6262-7920 0000-0002-6262-7920 |
| publishDate | 2019 |
| reponame_str | Instituto Politécnico de Coimbra |
| repository_id_str | urn:repositoryAcronym:ipc |
| service_str_mv | urn:repositoryAcronym:ipc |
| spelling | engengThis paper describes the application of the NLPyPort pipeline to Named Entity Recognition (NER) and Relation Extraction in Portuguese, more precisely in the scope of the IberLEF-2019 evaluation task on the topic. NER was tackled with CRF, based on several features, and trained in the HAREM collection, but results were low. This was partly caused by an issue on the submitted model, which had been trained in lowercase text, but, apparently, also due to the training data used, which highlights the different natures of HAREM, the source of the majority of the testing corpus, and SIGARRA. Relations were extracted with a set of rules bootstrapped from the examples provided by the organisation. Despite an F1-score of 0.72, we were the only participants in this task. We also express our doubts concerning the utility of the extracted relations.application/pdfengNLPyPort: Named Entity Recognition with CRF and Rule-Based Relation ExtractionFerreira, JoãoOliveira, Hugo GonçaloPersonalRodrigues, RicardoDSpacehttp://dspace.org/items/c64ccf7c-eca2-43cf-a4a2-78e684499c00DSpacehttp://dspace.org/items/c64ccf7c-eca2-43cf-a4a2-78e684499c00RodriguesRicardoCiência IDhttps://www.ciencia-id.ptD31C-FB4A-FEAAORCIDhttp://orcid.org0000-0002-6262-7920HostingInstitutionOrganizationalRepositório Comume-mailmailto:comum@rcaap.ptcomum@rcaap.pt2026-01-26T17:06:12Z20192019-01-01T00:00:00ZHandlehttp://hdl.handle.net/10400.26/61202http://purl.org/coar/access_right/c_abf2open accessNLPNERCRFRelation ExtractionPoS TaggingPattern Based888306 bytesother research producthttp://purl.org/coar/resource_type/c_c94fconference object2019http://creativecommons.org/licenses/by/4.0/http://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://comum.rcaap.pt/bitstreams/c80337e9-bd96-4463-892e-8d61fa8929c3/downloadProceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)4684762019 |
| spellingShingle | NLPyPort: Named Entity Recognition with CRF and Rule-Based Relation Extraction Ferreira, João NLP NER CRF Relation Extraction PoS Tagging Pattern Based |
| status | SINGLETON |
| subject.fl_str_mv | NLP NER CRF Relation Extraction PoS Tagging Pattern Based |
| title | NLPyPort: Named Entity Recognition with CRF and Rule-Based Relation Extraction |
| title_full | NLPyPort: Named Entity Recognition with CRF and Rule-Based Relation Extraction |
| title_fullStr | NLPyPort: Named Entity Recognition with CRF and Rule-Based Relation Extraction |
| title_full_unstemmed | NLPyPort: Named Entity Recognition with CRF and Rule-Based Relation Extraction |
| title_short | NLPyPort: Named Entity Recognition with CRF and Rule-Based Relation Extraction |
| title_sort | NLPyPort: Named Entity Recognition with CRF and Rule-Based Relation Extraction |
| topic | NLP NER CRF Relation Extraction PoS Tagging Pattern Based |
| topic_facet | NLP NER CRF Relation Extraction PoS Tagging Pattern Based |
| url | http://hdl.handle.net/10400.26/61202 |
| visible | 1 |