Detalhes do Documento

Evaluating the performance and improving the usability of parallel and distributed Word Embeddings tools

Autor(es): Silva, Mateus ; Meyer, Vinicius ; Kirchoff, Dionatrã ; Neto, Joaquim ; Vieira, Renata ; De Rose, Cesar

Data: 2021

Identificador Persistente: http://hdl.handle.net/10174/29056

Origem: Repositório Científico da Universidade de Évora

Assunto(s): Language models


Descrição

The representation of words by means of vectors, also called Word Embeddings (WE), has been receiving great attention from the Natural Language Processing (NLP) field. WE models are able to express syntactic and semantic similarities, as well as relationships and contexts of words within a given corpus. Although the most popular implementations of WE algorithms present low scalability, there are new approaches that apply High-Performance Computing (HPC) techniques. This is an opportunity for an analysis of the main differences among the existing implementations, based on performance and scalability metrics. In this paper, we present a study which addresses resource utilization and performance aspects of known WE algorithms found in the literature. To improve scalability and usability we propose a wrapper library for local and remote execution environments that contains a set of optimizations such as the pWord2vec, pWord2vec MPI, Wang2vec and the original Word2vec algorithm. Utilizing these optimizations it is possible to achieve an average performance gain of 15x for multicores and 105x for multinodes compared to the original version. There is also a big reduction in the memory footprint compared to the most popular python versions.

Tipo de Documento Artigo científico
Idioma Inglês
facebook logo  linkedin logo  twitter logo 
mendeley logo

Documentos Relacionados