Publicação
Data Mining Project to improve Soft Skills Acquisition and Engineering Education. Evaluation of CTCT Course Results
| Resumo: | This dissertation explores the potential of educational data mining (EDM) to enhance the engineering curriculum, specifically within the Transversal Skills in Science and Technology mandatory program for first-year students at NOVA School of Science and Technology. It employs machine learning (ML) algorithms, including classification and re- gression, to analyze student performance data. Various research concerns were addressed, such as the impact of gender on course selection, the relationships between different assessment points, performance variability, and the predictive power of early-course out- comes. Some of these questions were resolved through statistical analysis, while others leveraged machine learning models. Key findings reveal a distinct gender-based difference in course selection, with male students typically opting for engineering fields like mechanical, electrical, and informatics, while female students gravitated towards biology and chemistry-related courses. Addi- tionally, activity points proved to be the most reliable predictor of final grades, whereas self and peer-assessment points exhibited lower correlations, indicating a shift in the weighting of these points in the evaluation system. A detailed analysis of demographic and performance data showed that male students had a higher rate of failure than female students, despite enrolling in larger numbers. While the clustering analysis identified potential trends in early-course data, it did not provide substantial actionable insights in isolation. However, its integration into predictive models could offer valuable contributions in forecasting student performance in future research. The machine learning models demonstrated the effectiveness of early-course data in predicting final grades, with Random Forest and CatBoost models consistently outperforming others. The dissertation emphasizes the technical capabilities of platforms such as Google Colaboratory, highlighting its cost-effectiveness and efficiency for conducting large-scale educational data analysis using mid-range Graphical Processing Units (GPUs). Moreover, the research underscores the importance of standardizing data collection through Learning Management Systems (LMS), specifically Moodle, which can enhance the precision and efficiency of educational data analysis. The research concludes with recommendations for future work, including the develop- ment of Moodle plugins to track student performance in real-time, further exploration of machine learning models for grade prediction, and an in-depth analysis of computational resource costs. This study contributes to the field of Educational Data Mining (EDM) and provides practical insights for improving Science, Technology, Engineering, and Mathe- matics (STEM) education through data-driven strategies and the integration of Artificial Intelligence (AI). |
|---|---|
| Autores principais: | Dias, Alice Rodrigues |
| Assunto: | Artificial Intelligence in Education Educational Data Mining Knowledge Discovery Google Colaboratory Machine Learning STEM Education |
| Ano: | 2024 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade Nova de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório Institucional da UNL |
| Resumo: | This dissertation explores the potential of educational data mining (EDM) to enhance the engineering curriculum, specifically within the Transversal Skills in Science and Technology mandatory program for first-year students at NOVA School of Science and Technology. It employs machine learning (ML) algorithms, including classification and re- gression, to analyze student performance data. Various research concerns were addressed, such as the impact of gender on course selection, the relationships between different assessment points, performance variability, and the predictive power of early-course out- comes. Some of these questions were resolved through statistical analysis, while others leveraged machine learning models. Key findings reveal a distinct gender-based difference in course selection, with male students typically opting for engineering fields like mechanical, electrical, and informatics, while female students gravitated towards biology and chemistry-related courses. Addi- tionally, activity points proved to be the most reliable predictor of final grades, whereas self and peer-assessment points exhibited lower correlations, indicating a shift in the weighting of these points in the evaluation system. A detailed analysis of demographic and performance data showed that male students had a higher rate of failure than female students, despite enrolling in larger numbers. While the clustering analysis identified potential trends in early-course data, it did not provide substantial actionable insights in isolation. However, its integration into predictive models could offer valuable contributions in forecasting student performance in future research. The machine learning models demonstrated the effectiveness of early-course data in predicting final grades, with Random Forest and CatBoost models consistently outperforming others. The dissertation emphasizes the technical capabilities of platforms such as Google Colaboratory, highlighting its cost-effectiveness and efficiency for conducting large-scale educational data analysis using mid-range Graphical Processing Units (GPUs). Moreover, the research underscores the importance of standardizing data collection through Learning Management Systems (LMS), specifically Moodle, which can enhance the precision and efficiency of educational data analysis. The research concludes with recommendations for future work, including the develop- ment of Moodle plugins to track student performance in real-time, further exploration of machine learning models for grade prediction, and an in-depth analysis of computational resource costs. This study contributes to the field of Educational Data Mining (EDM) and provides practical insights for improving Science, Technology, Engineering, and Mathe- matics (STEM) education through data-driven strategies and the integration of Artificial Intelligence (AI). |
|---|