Publicação

MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations

Ver documento

Detalhes bibliográficos
Resumo:Understanding the relation between the meanings of words is an important part of comprehending natural language. Prior work has either focused on analysing lexical semantic relations in word embeddings or probing pretrained language models (PLMs), with some exceptions. Given the rarity of highly multilingual benchmarks, it is unclear to what extent PLMs capture relational knowledge and are able to transfer it across languages. To start addressing this question, we propose MultiLexBATS, a multilingual parallel dataset of lexical semantic relations adapted from BATS in 15 languages including low-resource languages, such as Bambara, Lithuanian, and Albanian. As experiment on cross-lingual transfer of relational knowledge, we test the PLMs’ ability to (1) capture analogies across languages, and (2) predict translation targets. We find considerable differences across relation types and languages with a clear preference for hypernymy and antonymy as well as romance languages.
Autores principais:Gromann, Dagmar
Outros Autores:Oliveira, Hugo Gonçalo; Pitarch, Lucia; Apostol, Elena-Simona; Bernad, Jordi; Bytyçi, Eliot; Cantone, Chiara; Carvalho, Sara; Frontini, Francesca; Garabik, Radovan; Gracia, Jorge; Granata, Letzia; Khan, Fahad; Knez, Timotej; Labropoulou, Penny; Liebeskind, Chaya; Buono, Maria Pia di; Anić, Ana Ostroški; Rackevičienė, Sigita; Rodrigues, Ricardo; Sérasse, Gilles; Selmistraitis, Linas; Sidibé, Mahammadou; Silvano, Purificação; Spahiu, Blerina; Sogutlu, Enriketa; Stanković, Ranka; Truică, Ciprian-Octavian; Oleškevičienė, Giedrė Valūnaitė; Zitnik, Slavko; Zdravkova, Katerina
Assunto:Lexical Semantic Relations Multilingual Benchmark BATS
Ano:2024
País:Portugal
Tipo de documento:comunicação em conferência
Tipo de acesso:acesso aberto
Instituição associada:Instituto Politécnico de Coimbra
Idioma:inglês
Origem:Instituto Politécnico de Coimbra
Descrição
Resumo:Understanding the relation between the meanings of words is an important part of comprehending natural language. Prior work has either focused on analysing lexical semantic relations in word embeddings or probing pretrained language models (PLMs), with some exceptions. Given the rarity of highly multilingual benchmarks, it is unclear to what extent PLMs capture relational knowledge and are able to transfer it across languages. To start addressing this question, we propose MultiLexBATS, a multilingual parallel dataset of lexical semantic relations adapted from BATS in 15 languages including low-resource languages, such as Bambara, Lithuanian, and Albanian. As experiment on cross-lingual transfer of relational knowledge, we test the PLMs’ ability to (1) capture analogies across languages, and (2) predict translation targets. We find considerable differences across relation types and languages with a clear preference for hypernymy and antonymy as well as romance languages.