Publication

Improving NLTK for Processing Portuguese

Bibliographic Details
Summary:	Python has a growing community of users, especially in the AI and ML fields. Yet, Computational Processing of Portuguese in this programming language is limited, in both available tools and results. This paper describes NLPyPort, a NLP pipeline in Python, primarily based on NLTK, and focused on Portuguese. It is mostly assembled from pre-existent resources or their adaptations, but improves over the performance of existing alternatives in Python, namely in the tasks of tokenization, PoS tagging, lemmatization and NER.
Main Authors:	Ferreira, João
Other Authors:	Oliveira, Hugo Gonçalo; Rodrigues, Ricardo
Subject:	NLP Tokenization PoS tagging Lemmatization Named Entity Recognition
Year:	2019
Country:	Portugal
Document type:	conference output
Access type:	open access
Associated institution:	Instituto Politécnico de Coimbra
Language:	English
Origin:	Instituto Politécnico de Coimbra

Description
Summary:	Python has a growing community of users, especially in the AI and ML fields. Yet, Computational Processing of Portuguese in this programming language is limited, in both available tools and results. This paper describes NLPyPort, a NLP pipeline in Python, primarily based on NLTK, and focused on Portuguese. It is mostly assembled from pre-existent resources or their adaptations, but improves over the performance of existing alternatives in Python, namely in the tasks of tokenization, PoS tagging, lemmatization and NER.