Publicação
MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations
| Resumo: | Understanding the relation between the meanings of words is an important part of comprehending natural language. Prior work has either focused on analysing lexical semantic relations in word embeddings or probing pretrained language models (PLMs), with some exceptions. Given the rarity of highly multilingual benchmarks, it is unclear to what extent PLMs capture relational knowledge and are able to transfer it across languages. To start addressing this question, we propose MultiLexBATS, a multilingual parallel dataset of lexical semantic relations adapted from BATS in 15 languages including low-resource languages, such as Bambara, Lithuanian, and Albanian. As experiment on cross-lingual transfer of relational knowledge, we test the PLMs’ ability to (1) capture analogies across languages, and (2) predict translation targets. We find considerable differences across relation types and languages with a clear preference for hypernymy and antonymy as well as romance languages. |
|---|---|
| Autores principais: | Gromann, Dagmar |
| Outros Autores: | Oliveira, Hugo Gonçalo; Pitarch, Lucia; Apostol, Elena-Simona; Bernad, Jordi; Bytyçi, Eliot; Cantone, Chiara; Carvalho, Sara; Frontini, Francesca; Garabik, Radovan; Gracia, Jorge; Granata, Letzia; Khan, Fahad; Knez, Timotej; Labropoulou, Penny; Liebeskind, Chaya; Buono, Maria Pia di; Anić, Ana Ostroški; Rackevičienė, Sigita; Rodrigues, Ricardo; Sérasse, Gilles; Selmistraitis, Linas; Sidibé, Mahammadou; Silvano, Purificação; Spahiu, Blerina; Sogutlu, Enriketa; Stanković, Ranka; Truică, Ciprian-Octavian; Oleškevičienė, Giedrė Valūnaitė; Zitnik, Slavko; Zdravkova, Katerina |
| Assunto: | Lexical Semantic Relations Multilingual Benchmark BATS |
| Ano: | 2024 |
| País: | Portugal |
| Tipo de documento: | comunicação em conferência |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Instituto Politécnico de Coimbra |
| Idioma: | inglês |
| Origem: | Instituto Politécnico de Coimbra |
| _version_ | 1867027066350731264 |
|---|---|
| author | Gromann, Dagmar |
| author2 | Oliveira, Hugo Gonçalo Pitarch, Lucia Apostol, Elena-Simona Bernad, Jordi Bytyçi, Eliot Cantone, Chiara Carvalho, Sara Frontini, Francesca Garabik, Radovan Gracia, Jorge Granata, Letzia Khan, Fahad Knez, Timotej Labropoulou, Penny Liebeskind, Chaya Buono, Maria Pia di Anić, Ana Ostroški Rackevičienė, Sigita Rodrigues, Ricardo Sérasse, Gilles Selmistraitis, Linas Sidibé, Mahammadou Silvano, Purificação Spahiu, Blerina Sogutlu, Enriketa Stanković, Ranka Truică, Ciprian-Octavian Oleškevičienė, Giedrė Valūnaitė Zitnik, Slavko Zdravkova, Katerina |
| author2_role | author author author author author author author author author author author author author author author author author author author author author author author author author author author author author author |
| author_facet | Gromann, Dagmar Oliveira, Hugo Gonçalo Pitarch, Lucia Apostol, Elena-Simona Bernad, Jordi Bytyçi, Eliot Cantone, Chiara Carvalho, Sara Frontini, Francesca Garabik, Radovan Gracia, Jorge Granata, Letzia Khan, Fahad Knez, Timotej Labropoulou, Penny Liebeskind, Chaya Buono, Maria Pia di Anić, Ana Ostroški Rackevičienė, Sigita Rodrigues, Ricardo Sérasse, Gilles Selmistraitis, Linas Sidibé, Mahammadou Silvano, Purificação Spahiu, Blerina Sogutlu, Enriketa Stanković, Ranka Truică, Ciprian-Octavian Oleškevičienė, Giedrė Valūnaitė Zitnik, Slavko Zdravkova, Katerina |
| author_role | author |
| contributor_name_str_mv | Repositório Comum |
| country_str | PT |
| creators_json_txt | [{\"Person.name\":\"Gromann, Dagmar\"},{\"Person.name\":\"Oliveira, Hugo Gonçalo\"},{\"Person.name\":\"Pitarch, Lucia\"},{\"Person.name\":\"Apostol, Elena-Simona\"},{\"Person.name\":\"Bernad, Jordi\"},{\"Person.name\":\"Bytyçi, Eliot\"},{\"Person.name\":\"Cantone, Chiara\"},{\"Person.name\":\"Carvalho, Sara\"},{\"Person.name\":\"Frontini, Francesca\"},{\"Person.name\":\"Garabik, Radovan\"},{\"Person.name\":\"Gracia, Jorge\"},{\"Person.name\":\"Granata, Letzia\"},{\"Person.name\":\"Khan, Fahad\"},{\"Person.name\":\"Knez, Timotej\"},{\"Person.name\":\"Labropoulou, Penny\"},{\"Person.name\":\"Liebeskind, Chaya\"},{\"Person.name\":\"Buono, Maria Pia di\"},{\"Person.name\":\"Anić, Ana Ostroški\"},{\"Person.name\":\"Rackevičienė, Sigita\"},{\"Person.name\":\"Rodrigues, Ricardo\",\"Person.identifier.orcid\":\"0000-0002-6262-7920\"},{\"Person.name\":\"Sérasse, Gilles\"},{\"Person.name\":\"Selmistraitis, Linas\"},{\"Person.name\":\"Sidibé, Mahammadou\"},{\"Person.name\":\"Silvano, Purificação\"},{\"Person.name\":\"Spahiu, Blerina\"},{\"Person.name\":\"Sogutlu, Enriketa\"},{\"Person.name\":\"Stanković, Ranka\"},{\"Person.name\":\"Truică, Ciprian-Octavian\"},{\"Person.name\":\"Oleškevičienė, Giedrė Valūnaitė\"},{\"Person.name\":\"Zitnik, Slavko\"},{\"Person.name\":\"Zdravkova, Katerina\"}] |
| datacite.contributors.contributor.contributorName.fl_str_mv | Repositório Comum |
| datacite.creators.creator.creatorName.fl_str_mv | Gromann, Dagmar Oliveira, Hugo Gonçalo Pitarch, Lucia Apostol, Elena-Simona Bernad, Jordi Bytyçi, Eliot Cantone, Chiara Carvalho, Sara Frontini, Francesca Garabik, Radovan Gracia, Jorge Granata, Letzia Khan, Fahad Knez, Timotej Labropoulou, Penny Liebeskind, Chaya Buono, Maria Pia di Anić, Ana Ostroški Rackevičienė, Sigita Rodrigues, Ricardo Sérasse, Gilles Selmistraitis, Linas Sidibé, Mahammadou Silvano, Purificação Spahiu, Blerina Sogutlu, Enriketa Stanković, Ranka Truică, Ciprian-Octavian Oleškevičienė, Giedrė Valūnaitė Zitnik, Slavko Zdravkova, Katerina |
| datacite.date.Accepted.fl_str_mv | 2024-01-01T00:00:00Z |
| datacite.date.available.fl_str_mv | 2026-06-01T13:43:02Z |
| datacite.date.embargoed.fl_str_mv | 2026-06-01T13:43:02Z |
| datacite.rights.fl_str_mv | http://purl.org/coar/access_right/c_abf2 |
| datacite.subjects.subject.fl_str_mv | Lexical Semantic Relations Multilingual Benchmark BATS |
| datacite.titles.title.fl_str_mv | MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations |
| dc.contributor.none.fl_str_mv | Repositório Comum |
| dc.creator.none.fl_str_mv | Gromann, Dagmar Oliveira, Hugo Gonçalo Pitarch, Lucia Apostol, Elena-Simona Bernad, Jordi Bytyçi, Eliot Cantone, Chiara Carvalho, Sara Frontini, Francesca Garabik, Radovan Gracia, Jorge Granata, Letzia Khan, Fahad Knez, Timotej Labropoulou, Penny Liebeskind, Chaya Buono, Maria Pia di Anić, Ana Ostroški Rackevičienė, Sigita Rodrigues, Ricardo Sérasse, Gilles Selmistraitis, Linas Sidibé, Mahammadou Silvano, Purificação Spahiu, Blerina Sogutlu, Enriketa Stanković, Ranka Truică, Ciprian-Octavian Oleškevičienė, Giedrė Valūnaitė Zitnik, Slavko Zdravkova, Katerina |
| dc.date.Accepted.fl_str_mv | 2024-01-01T00:00:00Z |
| dc.date.available.fl_str_mv | 2026-06-01T13:43:02Z |
| dc.date.embargoed.fl_str_mv | 2026-06-01T13:43:02Z |
| dc.format.none.fl_str_mv | application/pdf |
| dc.identifier.none.fl_str_mv | http://hdl.handle.net/10400.26/63453 |
| dc.language.none.fl_str_mv | eng |
| dc.rights.cclincense.fl_str_mv | http://creativecommons.org/licenses/by-nc/4.0/ |
| dc.rights.none.fl_str_mv | http://purl.org/coar/access_right/c_abf2 |
| dc.subject.none.fl_str_mv | Lexical Semantic Relations Multilingual Benchmark BATS |
| dc.title.fl_str_mv | MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations |
| dc.type.none.fl_str_mv | http://purl.org/coar/resource_type/c_5794 |
| description | Understanding the relation between the meanings of words is an important part of comprehending natural language. Prior work has either focused on analysing lexical semantic relations in word embeddings or probing pretrained language models (PLMs), with some exceptions. Given the rarity of highly multilingual benchmarks, it is unclear to what extent PLMs capture relational knowledge and are able to transfer it across languages. To start addressing this question, we propose MultiLexBATS, a multilingual parallel dataset of lexical semantic relations adapted from BATS in 15 languages including low-resource languages, such as Bambara, Lithuanian, and Albanian. As experiment on cross-lingual transfer of relational knowledge, we test the PLMs’ ability to (1) capture analogies across languages, and (2) predict translation targets. We find considerable differences across relation types and languages with a clear preference for hypernymy and antonymy as well as romance languages. |
| dirty | 0 |
| eu_rights_str_mv | openAccess |
| format | conferencePaper |
| fulltext.url.fl_str_mv | https://comum.rcaap.pt/bitstreams/79eb3ef1-40e1-400b-8378-e5751e68b2b4/download |
| id | ipc_0abd301532013aed2db6daafa70df09c |
| identifier.url.fl_str_mv | http://hdl.handle.net/10400.26/63453 |
| instacron_str | ipc |
| institution | Instituto Politécnico de Coimbra |
| instname_str | Instituto Politécnico de Coimbra |
| language | eng |
| network_acronym_str | ipc |
| network_name_str | Instituto Politécnico de Coimbra |
| oai_identifier_str | oai:comum.rcaap.pt:10400.26/63453 |
| organization_str_mv | urn:organizationAcronym:ipc |
| person_str_mv | Gromann, Dagmar Oliveira, Hugo Gonçalo Pitarch, Lucia Apostol, Elena-Simona Bernad, Jordi Bytyçi, Eliot Cantone, Chiara Carvalho, Sara Frontini, Francesca Garabik, Radovan Gracia, Jorge Granata, Letzia Khan, Fahad Knez, Timotej Labropoulou, Penny Liebeskind, Chaya Buono, Maria Pia di Anić, Ana Ostroški Rackevičienė, Sigita Rodrigues, Ricardo Rodrigues, Ricardo https://www.ciencia-id.pt/D31C-FB4A-FEAA D31C-FB4A-FEAA http://orcid.org/0000-0002-6262-7920 0000-0002-6262-7920 Sérasse, Gilles Selmistraitis, Linas Sidibé, Mahammadou Silvano, Purificação Spahiu, Blerina Sogutlu, Enriketa Stanković, Ranka Truică, Ciprian-Octavian Oleškevičienė, Giedrė Valūnaitė Zitnik, Slavko Zdravkova, Katerina |
| publishDate | 2024 |
| reponame_str | Instituto Politécnico de Coimbra |
| repository_id_str | urn:repositoryAcronym:ipc |
| service_str_mv | urn:repositoryAcronym:ipc |
| spelling | engengUnderstanding the relation between the meanings of words is an important part of comprehending natural language. Prior work has either focused on analysing lexical semantic relations in word embeddings or probing pretrained language models (PLMs), with some exceptions. Given the rarity of highly multilingual benchmarks, it is unclear to what extent PLMs capture relational knowledge and are able to transfer it across languages. To start addressing this question, we propose MultiLexBATS, a multilingual parallel dataset of lexical semantic relations adapted from BATS in 15 languages including low-resource languages, such as Bambara, Lithuanian, and Albanian. As experiment on cross-lingual transfer of relational knowledge, we test the PLMs’ ability to (1) capture analogies across languages, and (2) predict translation targets. We find considerable differences across relation types and languages with a clear preference for hypernymy and antonymy as well as romance languages.application/pdfengMultiLexBATS: Multilingual Dataset of Lexical Semantic RelationsGromann, DagmarOliveira, Hugo GonçaloPitarch, LuciaApostol, Elena-SimonaBernad, JordiBytyçi, EliotCantone, ChiaraCarvalho, SaraFrontini, FrancescaGarabik, RadovanGracia, JorgeGranata, LetziaKhan, FahadKnez, TimotejLabropoulou, PennyLiebeskind, ChayaBuono, Maria Pia diAnić, Ana OstroškiRackevičienė, SigitaPersonalRodrigues, RicardoDSpacehttp://dspace.org/items/c64ccf7c-eca2-43cf-a4a2-78e684499c00DSpacehttp://dspace.org/items/c64ccf7c-eca2-43cf-a4a2-78e684499c00RodriguesRicardoCiência IDhttps://www.ciencia-id.ptD31C-FB4A-FEAAORCIDhttp://orcid.org0000-0002-6262-7920Sérasse, GillesSelmistraitis, LinasSidibé, MahammadouSilvano, PurificaçãoSpahiu, BlerinaSogutlu, EnriketaStanković, RankaTruică, Ciprian-OctavianOleškevičienė, Giedrė ValūnaitėZitnik, SlavkoZdravkova, KaterinaHostingInstitutionOrganizationalRepositório Comume-mailmailto:comum@rcaap.ptcomum@rcaap.pt2026-06-01T13:43:02Z20242024-01-01T00:00:00ZHandlehttp://hdl.handle.net/10400.26/63453http://purl.org/coar/access_right/c_abf2open accessLexical Semantic RelationsMultilingual BenchmarkBATS324579 bytesother research producthttp://purl.org/coar/resource_type/c_5794conference paper2024http://creativecommons.org/licenses/by-nc/4.0/http://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://comum.rcaap.pt/bitstreams/79eb3ef1-40e1-400b-8378-e5751e68b2b4/downloadProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)1178311793Torino2024 |
| spellingShingle | MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations Gromann, Dagmar Lexical Semantic Relations Multilingual Benchmark BATS |
| status | SINGLETON |
| subject.fl_str_mv | Lexical Semantic Relations Multilingual Benchmark BATS |
| title | MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations |
| title_full | MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations |
| title_fullStr | MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations |
| title_full_unstemmed | MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations |
| title_short | MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations |
| title_sort | MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations |
| topic | Lexical Semantic Relations Multilingual Benchmark BATS |
| topic_facet | Lexical Semantic Relations Multilingual Benchmark BATS |
| url | http://hdl.handle.net/10400.26/63453 |
| visible | 1 |