Performance Comparison of Xgboost and Random Forest for The Prediction of Students Academic Performance

Utibe Peter Inyang; Ekemini Anietie Johnson

Performance Comparison of Xgboost and Random Forest for The Prediction of Students Academic Performance (Published)

Article Author: Utibe Peter Inyang and Ekemini Anietie Johnson

In educational data mining and learning analytics, predicting student academic performance is essential because it provides stakeholders with useful information to improve educational outcomes. In order to predict students’ academic results, this study assesses and contrasts the effectiveness of two popular machine learning algorithms: Random Forest (RF) and Extreme Gradient Boosting (XGBoost). Data preparation methods, such as principal component analysis (PCA) and feature normalization, were used to enhance a real-world dataset of 400 records gathered from six departments at Federal Polytechnic Ukana. Based on their Eigen values and explained variance, sixteen crucial input features were chosen for examination. Eighty percent (80%) of the dataset was used for training, and the remaining twenty percent (20%) was used for testing. To evaluate the performance of the models, evaluation metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), R-Squared Score (R²), Explained Variance Score (EVS), and Median Absolute Error (MedAE) were used. The findings show that both models have strong predictive powers, with RF marginally outperforming XGBoost in important parameters. The results highlight the potential of data-driven tactics to enhance student outcomes and offer evidence-based suggestions for choosing machine learning models in educational predictive analytics.

Keywords: : Academic Performance, Performance, Prediction, Random Forest, Students, extreme gradient boosting

An Intelligent Analytic Framework for Predicting Students Academic Performance Using Multiple Linear Regression and Random Forest (Published)

Article Author: Ekemini A. Johnson, Jude A. Inyangetoh, Habeeb A. Rahmon, Tope G. Jimoh, Eduediuyai E. Dan, Mfon O. Esang

In the contemporary educational landscape, data-driven decision-making has become pivotal for enhancing student success. This article explores an intelligent analytic framework leveraging Multiple Linear Regression (MLR) and Random Forest (RF) algorithms to predict student performance, providing a comparative analysis of their predictive capabilities. MLR, a statistical technique, models the relationship between students’ grades and various factors such as attendance and socio-economic background, offering transparency and interpretability of the impact of each predictor. RF, an ensemble learning method, excels in handling large datasets and capturing non-linear interactions among variables, offering higher accuracy in prediction. The study was conducted using 664 datasets from eight departments of Federal Polytechnic Ukana, following rigorous data preprocessing and normalization. The performance of both models was evaluated based on metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), R-squared Score (R²), and Explained Variance Score (EVS). The results revealed that RF outperformed MLR significantly, with lower error rates and higher predictive accuracy. Scatter plots and bar charts further illustrated the robust performance of RF over MLR. This research underscores the potential of integrating advanced machine learning techniques in educational settings to provide deeper insights into student performance, enabling timely and targeted interventions. The findings advocate for the adoption of RF for more accurate predictions and improved educational outcomes. Future research should explore hybrid models and expand the dataset to validate the applicability of these findings across diverse educational contexts.

Keywords: : Academic Performance, Random Forest, machine learning and ensemble learning, multiple linear regression

Data-Driven Framework for Crop Categorization using Random Forest-Based Approach for Precision Farming Optimization (Published)

Article Author: Adefemi Joshua Ayoola, Joe Essien, Martin Ogharandukun, Felix Uloko

Making incorrect choices when selecting crops can result in substantial financial losses for farmers, primarily because of a limited understanding of the unique needs of each crop. Each farm possesses unique characteristics, influencing the effectiveness of modern agricultural solutions. Challenges persist in optimizing farming methods to maximize yield. This study aims to mitigate these issues by developing a data-driven crop classification and cultivation advisory system, leveraging machine learning algorithms and agricultural data. By analysing variables such as soil nutrient levels, temperature, humidity, pH, and rainfall, the system offers tailored recommendations for crop selection and cultivation practices. This approach optimizes resource utilization, enhances crop productivity, and promotes sustainable agriculture. The study emphasizes the importance of pre-processing data, such as handling missing values and normalizing features, to ensure reliable model training. Various machine learning models, including Random Forests, Bagging Classifier, and AdaBoost Classifier, were employed, demonstrating high accuracy rates in crop classification tasks. The integration of real-time weather data, market prices, and profitability analysis further refines decision-making, while a mobile application facilitates convenient access for farmers. By incorporating user feedback and continuous data collection, the system’s performance can be continuously improved, offering precise and economically viable agricultural advice.

Keywords: Random Forest, crop classification, cultivation advisory, machine learning, precision farming.

A model for Real Estate Price Prediction using Multi-Level Stacking Ensemble Technique (Published)

Article Author: Lesi Nnadozie, Daniel Matthias and E.O Bennett

Recent research and economic publications have shown the impact of real estate investment on the over economy of Nigeria. It is therefore crucial to employ machine learning technique to predict the price for real estate properties. Real estate price analysis and prediction will assist in establishment of real estate policies and can also be used to aid real estate property stakeholders to come up with informative decisions without bias or prejudice. Thus, it is imperative to develop a model to improve the accuracy of real estate price prediction. The goal of this research is to develop a model using a multi-level stacking ensemble technique to predict price of real estate property. The dataset utilized for the study was collected from transactions done by real estate firms in Port Harcourt and it consist of a total of 1053 rows with twelve features. The base model used includes Random Forest(RF), Extreme Gradient Boosting Algorithm(XGBoost), Light Gradient Boosting Machine(LightGBM), Decision Tree regression and ElasticNet Regression. Various combinations of the base models were stacked using StackingCVRegressor. The final model was developed by combining the best performing stacked models and evaluated using R-Square, Mean Absolute Error(MAE), Root Mean Square Error(RMSE), Mean Square Error(MSE) and Training time. The proposed model outperformed the various individual base model with R-square of 0.985203, MSE of 0.013438, RMSE of 0.115923, MAE of 0.063411 and training time of 0.599398. The result show that multi-level stacking significant improve the accuracy of a model. Again, it was observed stacking improve the performance accuracy of a model at the cost of computational time. Stacking by using blending function for the proposed model significantly reduced the computational time for training the model to 0.599398 second when compared to using StackingCVRegressor with training time of 107.054931 seconds. Therefore, multi-level stacking ensemble technique can be employed to improve the predictive accuracy of a prediction model. Future work can be done by increasing the dataset and also increasing the number of features.

Citation: Nnadozie L, Matthias D., and Bennett E.O. (2022) A model for Real Estate Price Prediction using Multi-Level Stacking Ensemble Technique, European Journal of Computer Science and Information Technology, Vol.10, No.3, pp.33-46

Keywords: Extreme Gradient Boosting Algorithm(XGBoost), Multi-level Stacking Ensemble Technique, Random Forest, Real Estate, machine learning

Random Forest