European Journal of Statistics and Probability (EJSP)

XGboost

Feature Importance Analysis for Student Dropout Prediction Using Principal Component Analysis (Published)

Student dropout remains a persistent challenge in tertiary education, particularly in developing countries where early identification of at-risk students is often limited by inadequate analytical frameworks. This study presents a Principal Component Analysis (PCA) based feature importance framework for identifying key determinants of student dropout in Nigerian polytechnics. Using a dataset of 2,200 student records obtained from Federal Polytechnic Ukana and Akwa Ibom State Polytechnic, PCA was applied to reduce dimensionality, eliminate redundancy, and reveal the most influential factors contributing to student attrition.The analysis extracted sixteen principal components from an initial set of twenty-two variables, collectively accounting for approximately 93.26% of the total variance in the dataset. The first principal component, largely dominated by class attendance, explained 12.14% of the total variance, indicating its strong influence on student persistence. This was followed by previous academic performance (10.02%), study hours per day (8.66%), internet access at home (8.04%), and performance in the previous semester (7.58%). Other notable contributors included residential status, parental educational background, motivation level, and confidence in current courses. Variables such as gender and extracurricular participation contributed minimally to variance, indicating weaker influence on dropout outcomes. The PCA results demonstrate that academic engagement and learning behavior factors contribute more significantly to student dropout risk than demographic characteristics. By transforming correlated variables into orthogonal components, PCA enhanced interpretability and revealed latent structures underlying student performance patterns. The cumulative variance explained confirms that a reduced set of features can effectively represent student dropout behavior without substantial information loss. This study highlights the effectiveness of Principal Component Analysis as a robust analytical tool for understanding student dropout dynamics and supporting data-driven decision-making in higher education. The findings provide empirical evidence for developing early warning systems and targeted intervention strategies aimed at improving student retention, particularly within resource-constrained educational environments.

Keywords: Nigerian higher education, Random Forest, XGboost, intelligent analytics, machine learning, student dropout prediction

Predictive Modeling of Students’ Dropout Risk Using Intelligent Analytics (Published)

Student dropout is a persistent challenge in higher education, particularly in developing countries like Nigeria, where reactive institutional responses often fail to identify students at-risk in time. This study proposes an intelligent analytics-based predictive modeling framework designed to transition institutional strategies from reactive to proactive early intervention. Using a dataset of 2,200 student records from Federal Polytechnic Ukana and Akwa Ibom State Polytechnic, the research evaluates the effectiveness of two ensemble learning algorithms: Random Forest (RF) and Extreme Gradient Boosting (XGBoost). The methodology involved robust data preprocessing, including Min-Max normalization and Principal Component Analysis (PCA), which identified 16 key predictors from an initial 22 variables. These variables spanned academic performance, demographic backgrounds, and behavioral patterns. Experimental results conducted in a Python environment revealed that XGBoost outperformed RF across all evaluation metrics. XGBoost achieved an accuracy of 0.92, precision of 0.91, recall of 0.90, and an F1-score of 0.91, compared to RF’s accuracy of 0.87. Feature importance analysis highlighted “Attendance in Classes” and “Previous Academic Results” as the most significant predictors of attrition. The study concludes that intelligent analytics can effectively capture nonlinear relationships in student data to provide actionable insights. This framework offers a scalable solution for Nigerian tertiary institutions to implement evidence-based retention strategies, ultimately improving graduation outputs and institutional efficiency.

Keywords: Nigerian higher education, Random Forest, XGboost, educational data mining., intelligent analytics, predictive modeling, student dropout risk

Scroll to Top

Don't miss any Call For Paper update from EA Journals

Fill up the form below and get notified everytime we call for new submissions for our journals.