Publicação

NER in archival finding aids: extended

Ver documento

Detalhes bibliográficos
Resumo:The amount of information preserved in Portuguese archives has increased over the years. These documents represent a national heritage of high importance, as they portray the country’s history. Currently, most Portuguese archives have made their finding aids available to the public in digital format, however, these data do not have any annotation, so it is not always easy to analyze their content. In this work, Named Entity Recognition solutions were created that allow the identification and classification of several named entities from the archival finding aids. These named entities translate into crucial information about their context and, with high confidence results, they can be used for several purposes, for example, the creation of smart browsing tools by using entity linking and record linking techniques. In order to achieve high result scores, we annotated several corpora to train our own Machine Learning algorithms in this context domain. We also used different architectures, such as CNNs, LSTMs, and Maximum Entropy models. Finally, all the created datasets and ML models were made available to the public with a developed web platform, NER@DI.
Autores principais:Cunha, Luís Filipe da Costa
Outros Autores:Ramalho, José Carlos
Assunto:named entity recognition archival search aids machine learning deep learning maximum entropy
Ano:2022
País:Portugal
Tipo de documento:artigo
Tipo de acesso:acesso aberto
Instituição associada:Universidade do Minho
Idioma:inglês
Origem:RepositóriUM - Universidade do Minho
_version_ 1866877396172406784
author Cunha, Luís Filipe da Costa
author2 Ramalho, José Carlos
author2_role author
author_facet Cunha, Luís Filipe da Costa
Ramalho, José Carlos
author_role author
contributor_name_str_mv Universidade do Minho
country_str PT
creators_json_txt [{\"Person.name\":\"Cunha, Luís Filipe da Costa\"},{\"Person.name\":\"Ramalho, José Carlos\"}]
datacite.contributors.contributor.contributorName.fl_str_mv Universidade do Minho
datacite.creators.creator.creatorName.fl_str_mv Cunha, Luís Filipe da Costa
Ramalho, José Carlos
datacite.date.Accepted.fl_str_mv 2022-01-17T00:00:00Z
datacite.date.available.fl_str_mv 2022-03-29T11:56:39Z
datacite.date.embargoed.fl_str_mv 2022-03-29T11:56:39Z
datacite.rights.fl_str_mv http://purl.org/coar/access_right/c_abf2
datacite.subjects.subject.fl_str_mv named entity recognition
archival search aids
machine learning
deep learning
maximum entropy
datacite.titles.title.fl_str_mv NER in archival finding aids: extended
dc.contributor.none.fl_str_mv Universidade do Minho
dc.creator.none.fl_str_mv Cunha, Luís Filipe da Costa
Ramalho, José Carlos
dc.date.Accepted.fl_str_mv 2022-01-17T00:00:00Z
dc.date.available.fl_str_mv 2022-03-29T11:56:39Z
dc.date.embargoed.fl_str_mv 2022-03-29T11:56:39Z
dc.format.none.fl_str_mv application/pdf
dc.identifier.none.fl_str_mv https://hdl.handle.net/1822/76687
dc.language.none.fl_str_mv eng
dc.publisher.none.fl_str_mv Multidisciplinary Digital Publishing Institute
dc.rights.cclincense.fl_str_mv http://creativecommons.org/licenses/by/4.0/
dc.rights.none.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.rights.rights.copyright.fl_str_mv openAccess
dc.subject.none.fl_str_mv named entity recognition
archival search aids
machine learning
deep learning
maximum entropy
dc.title.fl_str_mv NER in archival finding aids: extended
dc.type.none.fl_str_mv http://purl.org/coar/resource_type/c_6501
description The amount of information preserved in Portuguese archives has increased over the years. These documents represent a national heritage of high importance, as they portray the country’s history. Currently, most Portuguese archives have made their finding aids available to the public in digital format, however, these data do not have any annotation, so it is not always easy to analyze their content. In this work, Named Entity Recognition solutions were created that allow the identification and classification of several named entities from the archival finding aids. These named entities translate into crucial information about their context and, with high confidence results, they can be used for several purposes, for example, the creation of smart browsing tools by using entity linking and record linking techniques. In order to achieve high result scores, we annotated several corpora to train our own Machine Learning algorithms in this context domain. We also used different architectures, such as CNNs, LSTMs, and Maximum Entropy models. Finally, all the created datasets and ML models were made available to the public with a developed web platform, NER@DI.
dirty 0
eu_rights_str_mv openAccess
format article
fulltext.url.fl_str_mv https://prod-dspace.uminho.pt/bitstreams/5cca2c0f-23ec-4175-8b9b-71dfe8a88006/download
id rum_3bf4d4d97acd57eaeff6a6cccbb4f549
identifier.url.fl_str_mv https://hdl.handle.net/1822/76687
instacron_str repositorium
institution Universidade do Minho
instname_str Universidade do Minho
language eng
network_acronym_str rum
network_name_str RepositóriUM - Universidade do Minho
oai_identifier_str oai:repositorium.uminho.pt:1822/76687
organization_str_mv urn:organizationAcronym:repositorium
person_str_mv Cunha, Luís Filipe da Costa
Ramalho, José Carlos
publishDate 2022
publisher.none.fl_str_mv Multidisciplinary Digital Publishing Institute
reponame_str RepositóriUM - Universidade do Minho
repository_id_str urn:repositoryAcronym:rum
service_str_mv urn:repositoryAcronym:rum
spelling engMultidisciplinary Digital Publishing InstituteporThe amount of information preserved in Portuguese archives has increased over the years. These documents represent a national heritage of high importance, as they portray the country’s history. Currently, most Portuguese archives have made their finding aids available to the public in digital format, however, these data do not have any annotation, so it is not always easy to analyze their content. In this work, Named Entity Recognition solutions were created that allow the identification and classification of several named entities from the archival finding aids. These named entities translate into crucial information about their context and, with high confidence results, they can be used for several purposes, for example, the creation of smart browsing tools by using entity linking and record linking techniques. In order to achieve high result scores, we annotated several corpora to train our own Machine Learning algorithms in this context domain. We also used different architectures, such as CNNs, LSTMs, and Maximum Entropy models. Finally, all the created datasets and ML models were made available to the public with a developed web platform, NER@DI.application/pdfengNER in archival finding aids: extendedCunha, Luís Filipe da CostaRamalho, José CarlosHostingInstitutionOrganizationalUniversidade do Minhoe-mailmailto:repositorium@usdb.uminho.ptrepositorium@usdb.uminho.ptDOIIsPartOf10.3390/make40100032022-03-29T11:56:39Z2022-01-172022-03-24T14:47:06Z2022-01-17T00:00:00ZHandlehttps://hdl.handle.net/1822/76687http://purl.org/coar/access_right/c_abf2open accessnamed entity recognitionarchival search aidsmachine learningdeep learningmaximum entropy1728733 bytesliteraturehttp://purl.org/coar/resource_type/c_6501journal article2022-01-17http://creativecommons.org/licenses/by/4.0/openAccesshttp://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://prod-dspace.uminho.pt/bitstreams/5cca2c0f-23ec-4175-8b9b-71dfe8a88006/download
spellingShingle NER in archival finding aids: extended
Cunha, Luís Filipe da Costa
named entity recognition
archival search aids
machine learning
deep learning
maximum entropy
status SINGLETON
subject.fl_str_mv named entity recognition
archival search aids
machine learning
deep learning
maximum entropy
title NER in archival finding aids: extended
title_full NER in archival finding aids: extended
title_fullStr NER in archival finding aids: extended
title_full_unstemmed NER in archival finding aids: extended
title_short NER in archival finding aids: extended
title_sort NER in archival finding aids: extended
topic named entity recognition
archival search aids
machine learning
deep learning
maximum entropy
topic_facet named entity recognition
archival search aids
machine learning
deep learning
maximum entropy
url https://hdl.handle.net/1822/76687
visible 1