Publicação
Avoiding question-answering congestion on health services using chatbots
| Resumo: | The proliferation of social networks presents a significant amount of fake news and fake information every day and every second. The COVID-19 pandemic confirms this situation. The general ignorance of this disease causes the spreading of misleading information, harming people's lives and governments' actions to contain it. To fight this infodemic, the populations resorted to the health services' phone lines, congesting them with questions, most of them repeated among different individuals and locations. A chatbot for COVID-19- related questions would redirect this workload from the health services, mitigating such congestion. This chatbot should work for both the English and Portuguese languages. This work provides a background overview about web crawlers, information processing and chatbot development, which are the three components of the application. A systematic literature review was done to provide an analysis of the existing literature on the mentioned thematics. The application presented in this work consists of three main modules: a web crawler, using the ACHE crawler application, which downloads the web pages from the trustworthy sources; a text processor, that parses the web pages and indexes them according to their language to the respective ElasticSearch index; and a chatbot component, composed by a fine-tuned BERT model with the SQuAD 2.0 dataset and a web interface that queries the ElasticSearch indexes for the most relevant pages and extracts the answers to the given questions by the users. To comply with the English and Portuguese requirement, two sets of reliable sources were defined (one for each language) and a translated version of SQuAD 1.1 dataset was used to train the Portuguese BERT model. The chatbot queries the correct model using the web browser's defined language. Our system was evaluated using a set of COVID-19 QA pairs extracted from the United Nations website, and the obtained results are described in this work. These were far from the desirable outcomes, so some improvements were applied to the crawler and to the ElasticSearch indexes. However the results were still not satisfactory, requiring a set of future modifications that are presented in this work. |
|---|---|
| Autores principais: | Pereira, Henrique Manuel Palmeira |
| Assunto: | Chatbot Information processing Natural language processing COVID-19 Processamento da informação Processamento de linguagem natural |
| Ano: | 2022 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade do Minho |
| Idioma: | inglês |
| Origem: | RepositóriUM - Universidade do Minho |
| Resumo: | The proliferation of social networks presents a significant amount of fake news and fake information every day and every second. The COVID-19 pandemic confirms this situation. The general ignorance of this disease causes the spreading of misleading information, harming people's lives and governments' actions to contain it. To fight this infodemic, the populations resorted to the health services' phone lines, congesting them with questions, most of them repeated among different individuals and locations. A chatbot for COVID-19- related questions would redirect this workload from the health services, mitigating such congestion. This chatbot should work for both the English and Portuguese languages. This work provides a background overview about web crawlers, information processing and chatbot development, which are the three components of the application. A systematic literature review was done to provide an analysis of the existing literature on the mentioned thematics. The application presented in this work consists of three main modules: a web crawler, using the ACHE crawler application, which downloads the web pages from the trustworthy sources; a text processor, that parses the web pages and indexes them according to their language to the respective ElasticSearch index; and a chatbot component, composed by a fine-tuned BERT model with the SQuAD 2.0 dataset and a web interface that queries the ElasticSearch indexes for the most relevant pages and extracts the answers to the given questions by the users. To comply with the English and Portuguese requirement, two sets of reliable sources were defined (one for each language) and a translated version of SQuAD 1.1 dataset was used to train the Portuguese BERT model. The chatbot queries the correct model using the web browser's defined language. Our system was evaluated using a set of COVID-19 QA pairs extracted from the United Nations website, and the obtained results are described in this work. These were far from the desirable outcomes, so some improvements were applied to the crawler and to the ElasticSearch indexes. However the results were still not satisfactory, requiring a set of future modifications that are presented in this work. |
|---|