| Resumo: | This work is centered around an important task in the field of Machine Translation (MT), which is Named Entity Recognition (NER). The work conducted at Unbabel, an international artificial intelligence-powered human translation company, allowed us to evaluate the performance of the NER system when dealing with Named Entities (NEs) in Simplified Chinese. In relation to the project, two experiments are conducted together with Unbabel’s Natural Language Processing (NLP) team to find out the best way to develop a NER model for Simplified Chinese. Two possible methods are proposed: training the model directly with gold standards created by human annotated data and training the model with gold standards built using NE projection with a word aligner. For both experiments, an important task is NE annotation, which is performed by a professional linguist. The annotated data serves as the gold standard for the experiments afterwards. In the first experiment, we aim to test out the viability of the first method. To achieve this purpose, manually annotated data from the NE annotation task is used to train the NER model. In the second experiment, we aim to test out the viability of training the NER system with gold standards built by the aligner Simalign. In this experiment, data from the NE annotation task is used again as the gold standard to evaluate the performance of the aligner when completing the NE projection task. The performances of both the NER model and the aligner are evaluated according to the standard performance metrics. Through the experiments, we found that even though the NER model achieved overall promising results when trained with manually annotated data, there was still a lot of room for improvements. On the other hand, Simalign yielded very satisfying results when completing the NE projection task. Due to time constraints, we did not train the NER model with the data obtained from Simalign. However, results show that it is a very suitable aligner for NE projection in Simplified Chinese and that using an aligner is a viable way to train a Chinese NER model. We are optimistic that this method surpasses the first one. The results were integrated into two core projects: MAIA (Graça et al, 2020) and the Center for Responsible AI1, due to privacy issues with NE. The results of these experiments are essential in providing us insights on future NER development, which can have a positive impact on the overall improvement of MT quality. |