Publicação

MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations

Ver documento

Detalhes bibliográficos
Resumo:Understanding the relation between the meanings of words is an important part of comprehending natural language. Prior work has either focused on analysing lexical semantic relations in word embeddings or probing pretrained language models (PLMs), with some exceptions. Given the rarity of highly multilingual benchmarks, it is unclear to what extent PLMs capture relational knowledge and are able to transfer it across languages. To start addressing this question, we propose MultiLexBATS, a multilingual parallel dataset of lexical semantic relations adapted from BATS in 15 languages including low-resource languages, such as Bambara, Lithuanian, and Albanian. As experiment on cross-lingual transfer of relational knowledge, we test the PLMs’ ability to (1) capture analogies across languages, and (2) predict translation targets. We find considerable differences across relation types and languages with a clear preference for hypernymy and antonymy as well as romance languages.
Autores principais:Gromann, Dagmar
Outros Autores:Oliveira, Hugo Gonçalo; Pitarch, Lucia; Apostol, Elena-Simona; Bernad, Jordi; Bytyçi, Eliot; Cantone, Chiara; Carvalho, Sara; Frontini, Francesca; Garabik, Radovan; Gracia, Jorge; Granata, Letzia; Khan, Fahad; Knez, Timotej; Labropoulou, Penny; Liebeskind, Chaya; Buono, Maria Pia di; Anić, Ana Ostroški; Rackevičienė, Sigita; Rodrigues, Ricardo; Sérasse, Gilles; Selmistraitis, Linas; Sidibé, Mahammadou; Silvano, Purificação; Spahiu, Blerina; Sogutlu, Enriketa; Stanković, Ranka; Truică, Ciprian-Octavian; Oleškevičienė, Giedrė Valūnaitė; Zitnik, Slavko; Zdravkova, Katerina
Assunto:Lexical Semantic Relations Multilingual Benchmark BATS
Ano:2024
País:Portugal
Tipo de documento:comunicação em conferência
Tipo de acesso:acesso aberto
Instituição associada:Instituto Politécnico de Coimbra
Idioma:inglês
Origem:Instituto Politécnico de Coimbra
_version_ 1867027066350731264
author Gromann, Dagmar
author2 Oliveira, Hugo Gonçalo
Pitarch, Lucia
Apostol, Elena-Simona
Bernad, Jordi
Bytyçi, Eliot
Cantone, Chiara
Carvalho, Sara
Frontini, Francesca
Garabik, Radovan
Gracia, Jorge
Granata, Letzia
Khan, Fahad
Knez, Timotej
Labropoulou, Penny
Liebeskind, Chaya
Buono, Maria Pia di
Anić, Ana Ostroški
Rackevičienė, Sigita
Rodrigues, Ricardo
Sérasse, Gilles
Selmistraitis, Linas
Sidibé, Mahammadou
Silvano, Purificação
Spahiu, Blerina
Sogutlu, Enriketa
Stanković, Ranka
Truică, Ciprian-Octavian
Oleškevičienė, Giedrė Valūnaitė
Zitnik, Slavko
Zdravkova, Katerina
author2_role author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author_facet Gromann, Dagmar
Oliveira, Hugo Gonçalo
Pitarch, Lucia
Apostol, Elena-Simona
Bernad, Jordi
Bytyçi, Eliot
Cantone, Chiara
Carvalho, Sara
Frontini, Francesca
Garabik, Radovan
Gracia, Jorge
Granata, Letzia
Khan, Fahad
Knez, Timotej
Labropoulou, Penny
Liebeskind, Chaya
Buono, Maria Pia di
Anić, Ana Ostroški
Rackevičienė, Sigita
Rodrigues, Ricardo
Sérasse, Gilles
Selmistraitis, Linas
Sidibé, Mahammadou
Silvano, Purificação
Spahiu, Blerina
Sogutlu, Enriketa
Stanković, Ranka
Truică, Ciprian-Octavian
Oleškevičienė, Giedrė Valūnaitė
Zitnik, Slavko
Zdravkova, Katerina
author_role author
contributor_name_str_mv Repositório Comum
country_str PT
creators_json_txt [{\"Person.name\":\"Gromann, Dagmar\"},{\"Person.name\":\"Oliveira, Hugo Gonçalo\"},{\"Person.name\":\"Pitarch, Lucia\"},{\"Person.name\":\"Apostol, Elena-Simona\"},{\"Person.name\":\"Bernad, Jordi\"},{\"Person.name\":\"Bytyçi, Eliot\"},{\"Person.name\":\"Cantone, Chiara\"},{\"Person.name\":\"Carvalho, Sara\"},{\"Person.name\":\"Frontini, Francesca\"},{\"Person.name\":\"Garabik, Radovan\"},{\"Person.name\":\"Gracia, Jorge\"},{\"Person.name\":\"Granata, Letzia\"},{\"Person.name\":\"Khan, Fahad\"},{\"Person.name\":\"Knez, Timotej\"},{\"Person.name\":\"Labropoulou, Penny\"},{\"Person.name\":\"Liebeskind, Chaya\"},{\"Person.name\":\"Buono, Maria Pia di\"},{\"Person.name\":\"Anić, Ana Ostroški\"},{\"Person.name\":\"Rackevičienė, Sigita\"},{\"Person.name\":\"Rodrigues, Ricardo\",\"Person.identifier.orcid\":\"0000-0002-6262-7920\"},{\"Person.name\":\"Sérasse, Gilles\"},{\"Person.name\":\"Selmistraitis, Linas\"},{\"Person.name\":\"Sidibé, Mahammadou\"},{\"Person.name\":\"Silvano, Purificação\"},{\"Person.name\":\"Spahiu, Blerina\"},{\"Person.name\":\"Sogutlu, Enriketa\"},{\"Person.name\":\"Stanković, Ranka\"},{\"Person.name\":\"Truică, Ciprian-Octavian\"},{\"Person.name\":\"Oleškevičienė, Giedrė Valūnaitė\"},{\"Person.name\":\"Zitnik, Slavko\"},{\"Person.name\":\"Zdravkova, Katerina\"}]
datacite.contributors.contributor.contributorName.fl_str_mv Repositório Comum
datacite.creators.creator.creatorName.fl_str_mv Gromann, Dagmar
Oliveira, Hugo Gonçalo
Pitarch, Lucia
Apostol, Elena-Simona
Bernad, Jordi
Bytyçi, Eliot
Cantone, Chiara
Carvalho, Sara
Frontini, Francesca
Garabik, Radovan
Gracia, Jorge
Granata, Letzia
Khan, Fahad
Knez, Timotej
Labropoulou, Penny
Liebeskind, Chaya
Buono, Maria Pia di
Anić, Ana Ostroški
Rackevičienė, Sigita
Rodrigues, Ricardo
Sérasse, Gilles
Selmistraitis, Linas
Sidibé, Mahammadou
Silvano, Purificação
Spahiu, Blerina
Sogutlu, Enriketa
Stanković, Ranka
Truică, Ciprian-Octavian
Oleškevičienė, Giedrė Valūnaitė
Zitnik, Slavko
Zdravkova, Katerina
datacite.date.Accepted.fl_str_mv 2024-01-01T00:00:00Z
datacite.date.available.fl_str_mv 2026-06-01T13:43:02Z
datacite.date.embargoed.fl_str_mv 2026-06-01T13:43:02Z
datacite.rights.fl_str_mv http://purl.org/coar/access_right/c_abf2
datacite.subjects.subject.fl_str_mv Lexical Semantic Relations
Multilingual Benchmark
BATS
datacite.titles.title.fl_str_mv MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations
dc.contributor.none.fl_str_mv Repositório Comum
dc.creator.none.fl_str_mv Gromann, Dagmar
Oliveira, Hugo Gonçalo
Pitarch, Lucia
Apostol, Elena-Simona
Bernad, Jordi
Bytyçi, Eliot
Cantone, Chiara
Carvalho, Sara
Frontini, Francesca
Garabik, Radovan
Gracia, Jorge
Granata, Letzia
Khan, Fahad
Knez, Timotej
Labropoulou, Penny
Liebeskind, Chaya
Buono, Maria Pia di
Anić, Ana Ostroški
Rackevičienė, Sigita
Rodrigues, Ricardo
Sérasse, Gilles
Selmistraitis, Linas
Sidibé, Mahammadou
Silvano, Purificação
Spahiu, Blerina
Sogutlu, Enriketa
Stanković, Ranka
Truică, Ciprian-Octavian
Oleškevičienė, Giedrė Valūnaitė
Zitnik, Slavko
Zdravkova, Katerina
dc.date.Accepted.fl_str_mv 2024-01-01T00:00:00Z
dc.date.available.fl_str_mv 2026-06-01T13:43:02Z
dc.date.embargoed.fl_str_mv 2026-06-01T13:43:02Z
dc.format.none.fl_str_mv application/pdf
dc.identifier.none.fl_str_mv http://hdl.handle.net/10400.26/63453
dc.language.none.fl_str_mv eng
dc.rights.cclincense.fl_str_mv http://creativecommons.org/licenses/by-nc/4.0/
dc.rights.none.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.subject.none.fl_str_mv Lexical Semantic Relations
Multilingual Benchmark
BATS
dc.title.fl_str_mv MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations
dc.type.none.fl_str_mv http://purl.org/coar/resource_type/c_5794
description Understanding the relation between the meanings of words is an important part of comprehending natural language. Prior work has either focused on analysing lexical semantic relations in word embeddings or probing pretrained language models (PLMs), with some exceptions. Given the rarity of highly multilingual benchmarks, it is unclear to what extent PLMs capture relational knowledge and are able to transfer it across languages. To start addressing this question, we propose MultiLexBATS, a multilingual parallel dataset of lexical semantic relations adapted from BATS in 15 languages including low-resource languages, such as Bambara, Lithuanian, and Albanian. As experiment on cross-lingual transfer of relational knowledge, we test the PLMs’ ability to (1) capture analogies across languages, and (2) predict translation targets. We find considerable differences across relation types and languages with a clear preference for hypernymy and antonymy as well as romance languages.
dirty 0
eu_rights_str_mv openAccess
format conferencePaper
fulltext.url.fl_str_mv https://comum.rcaap.pt/bitstreams/79eb3ef1-40e1-400b-8378-e5751e68b2b4/download
id ipc_0abd301532013aed2db6daafa70df09c
identifier.url.fl_str_mv http://hdl.handle.net/10400.26/63453
instacron_str ipc
institution Instituto Politécnico de Coimbra
instname_str Instituto Politécnico de Coimbra
language eng
network_acronym_str ipc
network_name_str Instituto Politécnico de Coimbra
oai_identifier_str oai:comum.rcaap.pt:10400.26/63453
organization_str_mv urn:organizationAcronym:ipc
person_str_mv Gromann, Dagmar
Oliveira, Hugo Gonçalo
Pitarch, Lucia
Apostol, Elena-Simona
Bernad, Jordi
Bytyçi, Eliot
Cantone, Chiara
Carvalho, Sara
Frontini, Francesca
Garabik, Radovan
Gracia, Jorge
Granata, Letzia
Khan, Fahad
Knez, Timotej
Labropoulou, Penny
Liebeskind, Chaya
Buono, Maria Pia di
Anić, Ana Ostroški
Rackevičienė, Sigita
Rodrigues, Ricardo
Rodrigues, Ricardo
https://www.ciencia-id.pt/D31C-FB4A-FEAA
D31C-FB4A-FEAA
http://orcid.org/0000-0002-6262-7920
0000-0002-6262-7920
Sérasse, Gilles
Selmistraitis, Linas
Sidibé, Mahammadou
Silvano, Purificação
Spahiu, Blerina
Sogutlu, Enriketa
Stanković, Ranka
Truică, Ciprian-Octavian
Oleškevičienė, Giedrė Valūnaitė
Zitnik, Slavko
Zdravkova, Katerina
publishDate 2024
reponame_str Instituto Politécnico de Coimbra
repository_id_str urn:repositoryAcronym:ipc
service_str_mv urn:repositoryAcronym:ipc
spelling engengUnderstanding the relation between the meanings of words is an important part of comprehending natural language. Prior work has either focused on analysing lexical semantic relations in word embeddings or probing pretrained language models (PLMs), with some exceptions. Given the rarity of highly multilingual benchmarks, it is unclear to what extent PLMs capture relational knowledge and are able to transfer it across languages. To start addressing this question, we propose MultiLexBATS, a multilingual parallel dataset of lexical semantic relations adapted from BATS in 15 languages including low-resource languages, such as Bambara, Lithuanian, and Albanian. As experiment on cross-lingual transfer of relational knowledge, we test the PLMs’ ability to (1) capture analogies across languages, and (2) predict translation targets. We find considerable differences across relation types and languages with a clear preference for hypernymy and antonymy as well as romance languages.application/pdfengMultiLexBATS: Multilingual Dataset of Lexical Semantic RelationsGromann, DagmarOliveira, Hugo GonçaloPitarch, LuciaApostol, Elena-SimonaBernad, JordiBytyçi, EliotCantone, ChiaraCarvalho, SaraFrontini, FrancescaGarabik, RadovanGracia, JorgeGranata, LetziaKhan, FahadKnez, TimotejLabropoulou, PennyLiebeskind, ChayaBuono, Maria Pia diAnić, Ana OstroškiRackevičienė, SigitaPersonalRodrigues, RicardoDSpacehttp://dspace.org/items/c64ccf7c-eca2-43cf-a4a2-78e684499c00DSpacehttp://dspace.org/items/c64ccf7c-eca2-43cf-a4a2-78e684499c00RodriguesRicardoCiência IDhttps://www.ciencia-id.ptD31C-FB4A-FEAAORCIDhttp://orcid.org0000-0002-6262-7920Sérasse, GillesSelmistraitis, LinasSidibé, MahammadouSilvano, PurificaçãoSpahiu, BlerinaSogutlu, EnriketaStanković, RankaTruică, Ciprian-OctavianOleškevičienė, Giedrė ValūnaitėZitnik, SlavkoZdravkova, KaterinaHostingInstitutionOrganizationalRepositório Comume-mailmailto:comum@rcaap.ptcomum@rcaap.pt2026-06-01T13:43:02Z20242024-01-01T00:00:00ZHandlehttp://hdl.handle.net/10400.26/63453http://purl.org/coar/access_right/c_abf2open accessLexical Semantic RelationsMultilingual BenchmarkBATS324579 bytesother research producthttp://purl.org/coar/resource_type/c_5794conference paper2024http://creativecommons.org/licenses/by-nc/4.0/http://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://comum.rcaap.pt/bitstreams/79eb3ef1-40e1-400b-8378-e5751e68b2b4/downloadProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)1178311793Torino2024
spellingShingle MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations
Gromann, Dagmar
Lexical Semantic Relations
Multilingual Benchmark
BATS
status SINGLETON
subject.fl_str_mv Lexical Semantic Relations
Multilingual Benchmark
BATS
title MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations
title_full MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations
title_fullStr MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations
title_full_unstemmed MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations
title_short MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations
title_sort MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations
topic Lexical Semantic Relations
Multilingual Benchmark
BATS
topic_facet Lexical Semantic Relations
Multilingual Benchmark
BATS
url http://hdl.handle.net/10400.26/63453
visible 1