Feature Importance Analysis for Student Dropout Prediction Using Principal Component Analysis (Published)
Student dropout remains a persistent challenge in tertiary education, particularly in developing countries where early identification of at-risk students is often limited by inadequate analytical frameworks. This study presents a Principal Component Analysis (PCA) based feature importance framework for identifying key determinants of student dropout in Nigerian polytechnics. Using a dataset of 2,200 student records obtained from Federal Polytechnic Ukana and Akwa Ibom State Polytechnic, PCA was applied to reduce dimensionality, eliminate redundancy, and reveal the most influential factors contributing to student attrition.The analysis extracted sixteen principal components from an initial set of twenty-two variables, collectively accounting for approximately 93.26% of the total variance in the dataset. The first principal component, largely dominated by class attendance, explained 12.14% of the total variance, indicating its strong influence on student persistence. This was followed by previous academic performance (10.02%), study hours per day (8.66%), internet access at home (8.04%), and performance in the previous semester (7.58%). Other notable contributors included residential status, parental educational background, motivation level, and confidence in current courses. Variables such as gender and extracurricular participation contributed minimally to variance, indicating weaker influence on dropout outcomes. The PCA results demonstrate that academic engagement and learning behavior factors contribute more significantly to student dropout risk than demographic characteristics. By transforming correlated variables into orthogonal components, PCA enhanced interpretability and revealed latent structures underlying student performance patterns. The cumulative variance explained confirms that a reduced set of features can effectively represent student dropout behavior without substantial information loss. This study highlights the effectiveness of Principal Component Analysis as a robust analytical tool for understanding student dropout dynamics and supporting data-driven decision-making in higher education. The findings provide empirical evidence for developing early warning systems and targeted intervention strategies aimed at improving student retention, particularly within resource-constrained educational environments.
Keywords: Nigerian higher education, Random Forest, XGboost, intelligent analytics, machine learning, student dropout prediction