Intelligent Ensemble Learning Framework for Prediction of Students Academic Performance Using Extreme Gradient Boosting and Random Forest Algorithms

Utibe Peter Inyang; Ekemini Anietie Johnson

Intelligent Ensemble Learning Framework for Prediction of Students Academic Performance Using Extreme Gradient Boosting and Random Forest Algorithms (Published)

Article Author: Utibe Peter Inyang and Ekemini Anietie Johnson

A key component of educational data mining (EDM) and learning analytics is the prediction of students’ academic achievement. Institutions can increase overall learning results, identify at-risk students, and carry out focused interventions by utilizing machine learning approaches. The Intelligent Ensemble Learning Framework presented in this paper combines Extreme Gradient Boosting (XGBoost) and Random Forest (RF) to increase prediction accuracy. XGBoost a powerful boosting strategy noted for its effectiveness in managing huge datasets and minimizing overfitting, combines multiple decision trees to reduce variation and improve model stability. The study uses information gathered from Federal Polytechnic Ukana, including attendance, demographics, and academic records of 400 students, among other pertinent characteristics. 16 important features were found based on eigenvalues and explained variance following data preprocessing, which included normalization and feature selection using Principal Component Analysis (PCA). The dataset was divided into subsets for testing (20%) and training (80%), and a bagging technique was used to create the ensemble model. Experimental results demonstrate that the ensemble model outperforms individual RF and XGBoost models in predicting students’ cumulative grade point average (CGPA). The performance evaluation, based on standard regression metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), R-Squared Score (R2), Explained Variance Score (EVS), and Median Absolute Error (MedAE), indicates superior predictive accuracy. The ensemble model achieved an R2 score of 0.9900, outperforming RF (0.9888) and XGBoost (0.9800). Visualizations using scatter plots, grouped bar charts, and heat maps further validate the effectiveness of the proposed approach. This research contributes to the growing body of work in machine learning applications in education, demonstrating the potential of ensemble regression models in academic performance prediction. The findings underscore the importance of advanced predictive models in educational institutions, facilitating proactive decision-making and student support strategies to enhance academic success.

Keywords: : Academic Performance, Ensemble learning, Framework, Intelligence, Prediction, extreme gradient boosting and random forest

Performance Comparison of Xgboost and Random Forest for The Prediction of Students Academic Performance (Published)

Article Author: Utibe Peter Inyang and Ekemini Anietie Johnson

In educational data mining and learning analytics, predicting student academic performance is essential because it provides stakeholders with useful information to improve educational outcomes. In order to predict students’ academic results, this study assesses and contrasts the effectiveness of two popular machine learning algorithms: Random Forest (RF) and Extreme Gradient Boosting (XGBoost). Data preparation methods, such as principal component analysis (PCA) and feature normalization, were used to enhance a real-world dataset of 400 records gathered from six departments at Federal Polytechnic Ukana. Based on their Eigen values and explained variance, sixteen crucial input features were chosen for examination. Eighty percent (80%) of the dataset was used for training, and the remaining twenty percent (20%) was used for testing. To evaluate the performance of the models, evaluation metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), R-Squared Score (R²), Explained Variance Score (EVS), and Median Absolute Error (MedAE) were used. The findings show that both models have strong predictive powers, with RF marginally outperforming XGBoost in important parameters. The results highlight the potential of data-driven tactics to enhance student outcomes and offer evidence-based suggestions for choosing machine learning models in educational predictive analytics.

Keywords: : Academic Performance, Performance, Prediction, Random Forest, Students, extreme gradient boosting

A Model for Estimation of Malaria Prediction in North-East Zone, Nigeria (Published)

Article Author: Jimoh, A. A., Salawu, S. A., Ojuope, K. I., and Ahmed, M. O.

This study examined Autogressive Integrated Moving Average (ARIMA) model for malaria prediction in the North-Eastern geo-political zone, Nigeria. Cross-sectional research design was adopted in this study as data were collected at a specific period of time. The datasets were collected from Federal Medical Centre, Azare and Federal Teaching Hospital, Gombe, spans through five years (2018 – 2022). The datasets were divided into 80% training set and 20% testing set. ARIMA model was used for estimation and best model was found to be ARIMA(2,2,2). The experiment was conducted in R-Studio. The model was diagnosed and cross-checked for the accuracy using Box-Ljung Statistic, normality curve, ACF, and PACF plots. ARIMA(2,2,2) was used to predict three-year future malaria incidents. The results showed that malaria cases were high in January 2023 with 305 cases (LCI=288 & UCI=898 cases). Also, in year 2024, cases of malaria would be high in December with 38 cases (LCI=653 & UCI=781 cases). Observing year 2025, malaria cases will toll high in December with 53 cases (LCI=714 & UCI=808 cases). It was also discovered that as months of the year increase, the cases of malaria increase. Mean Absolute Percentage Error (MAPE) of the ARIMA(2,2,2) was estimated and yielded 12.43% which implies that the model has 87.57% accuracy. Based on the findings, it is recommended that more treated mosquitoes net and medications should be provided by governments and NGOs to reduce malaria infections in the zone.

Keywords: ARIMA, Estimation, MAPE, Malaria, Model, Prediction, r-studio

Dynamic Decision Tree Based Ensembled Learning Model to Forecast Flight Status (Published)

Article Author: C. Ugwu, Ntuk, Ekaete

This paper explains the development of an enhanced predictive classifier for flight status that will reduce over fitting observed in existing models. A dynamic approach from ensemble learning technique called bagging algorithm was used to train a number of base learners using a base learning algorithm. The results of the various classifiers were combined, voting was done, by majority the most voted class was picked as the final output. This output was subjected to the decision tree algorithm to produce various replica sets generated from the training set to create various decision tree models. Object-Oriented Analysis and Design (OO-AD) methodology was adopted for the design and implementation was done with C# programming language. The result achieved was favorable as it was found to predict at an accuracy of 78.3% as against 68.2% accuracy of the existing systems which indicated an enhancement.

Keywords: : Flight Status, Bagging Algorithm, Classification, Ensemble learning, Prediction

Prediction