Mutual information-driven feature selection for Saber Pro test analysis: identifying key socioeconomic and academic predictors of student performance
DOI:
https://doi.org/10.26507/paper.4775Palabras clave:
mutual information, educational equity, Saber Pro tests, predictive analytics, socioeconomic determinants, ColombiaResumen
The Saber-Pro tests, designed to evaluate educational quality in Colombia, pose a significant challenge due to the diversity and complexity of factors influencing academic performance. This study presents a preliminary analysis of socioeconomic and academic variables with the aim of identifying those most relevant to explaining student achievement. To this end, the Mutual Information (MI) method is employed—a statistical technique that measures the dependency between two variables, indicating how much information about one can be inferred from the other. MI proves particularly useful in this context as it enables the prioritisation and ranking of variables based on their relevance in predicting academic performance. The analysis is applied to data from students of Uniminuto Virtual who took the Saber-Pro exams during the 2021–2023 period, evaluating multiple factors associated with participants’ socioeconomic and academic backgrounds. Among the most prominent variables are university tuition fees, academic programme, and geographic location. These were found to have a significant relationship with test scores, highlighting their key role in academic performance. The statistical approach adopted not only identifies the most relevant features but also provides a solid foundation for future studies aiming to implement advanced predictive models. Feature selection through mutual information contributes to optimising data analysis by reducing model complexity, minimising overfitting risks, and improving prediction accuracy. Moreover, this initial analysis is essential for ensuring that predictive models are efficient and capable of addressing the heterogeneity of educational data.
Citas
Ali, A., Jillani, F., Zaheer, R., Karim, A., Alharbi, Y. O., Alsaffar, M., & Alhamazani, K. (2022). Practically Implementation of Information Loss: Sensitivity, Risk by Different Feature Selection Techniques. IEEE Access, Vol. 10, pp. 27643-27654. https://doi.org/10.1109/ACCESS.2022.3152963
Bennasar, M., Hicks, Y., & Setchi, R. (2015). Feature selection using joint mutual information maximisation. Expert Systems with Applications, Vol. 42, No. 22, pp. 8520-8532. https://doi.org/10.1016/j.eswa.2015.07.007
Bhukya, R. (2025, March). Normalized Mutual Information-Driven Feature Extraction Method for Big Data Analytics. International Conference on Power Engineering and Intelligent Systems (PEIS), Singapore, Springer Nature Singapore, pp. 249-261. https://doi.org/10.1007/978-981-97- 6710-6_20
Boroumand, S., Bouganis, C.-S., & Constantinides, G. A. (2022, July). MIDAS: Mutual Information Driven Approximate Synthesis. 2022 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), IEEE, pp. 50-55. https://doi.org/10.1109/ISVLSI54635.2022.00022
Cheng, J., Sun, J., Yao, K., Xu, M., & Cao, Y. (2022). A variable selection method based on mutual information and variance inflation factor. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, Vol. 268, pp. 120652. https://doi.org/10.1016/j.saa.2021.120652
Cote-Ballesteros, J. E., Grisales Palacios, V. H., & Rodriguez-Castellanos, J. E. (2022). A Hybrid Approach Variable Selection Algorithm Based on Mutual Information for Data-Driven Industrial Soft-Sensor Applications. Ciencia e Ingeniería Neogranadina, Vol. 32, No. 1, pp. 59-70. https://doi.org/10.18359/rcin.5644
Hoque, N., Bhattacharyya, D. K., & Kalita, J. K. (2014). MIFS-ND: A mutual information-based feature selection method. Expert Systems with Applications, Vol. 41, No. 14, pp. 6371-6385. https://doi.org/10.1016/j.eswa.2014.04.019
ICFES. (2018). Documentación del examen Saber PRO Contenido. Ministerio de Educación Nacional. Consultado en: https://www.icfes.gov.co/documents/20143/518352/Documentacion+saber+pro.pdf
Islam, M. R., Ahmed, B., Hossain, M. A., & Uddin, M. P. (2023). Mutual information-driven feature reduction for hyperspectral image classification. Sensors, Vol. 23, No. 2, pp. 657. https://doi.org/10.3390/s23020657
Jeon, E., Ko, W., Yoon, J. S., & Suk, H. I. (2021). Mutual information-driven subject-invariant and class-relevant deep representation learning in BCI. IEEE Transactions on Neural Networks and Learning Systems, Vol. 34, No. 2, pp. 739-749. https://doi.org/10.1109/TNNLS.2021.3100583
Liu, S., & Motani, M. (2025). Improving Mutual Information based Feature Selection by Boosting Unique Relevance. Journal of Artificial Intelligence Research, Vol. 82, pp. 1267-1292. https://doi.org/10.1613/jair.1.17219
Liu, Y. (2004). A comparative study on feature selection methods for drug discovery. Journal of Chemical Information and Computer Sciences, Vol. 44, No. 5, pp. 1823-1828. https://doi.org/10.1021/ci049875d
Medina, J. E. C., Benavides, J. A. C., & Correa, L. Á. F. (2020). Análisis de los resultados de las Pruebas Saber Pro en estudiantes de la licenciatura en Educación Básica de la Universidad Pedagógica y Tecnológica de Colombia (UPTC). Plumilla Educativa, Vol. 25, No. 1, pp. 125-151. https://doi.org/10.30554/pe.1.3833.2020
Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 8, pp. 1226-1238. https://doi.org/10.1109/TPAMI.2005.159
Quintero, J. P. C. (2022). Comportamiento del uso de datos abiertos en Colombia (2016-2021). Ciencia y Poder Aéreo, Vol. 17, No. 1, pp. 137-149. https://doi.org/10.18667/cienciaypoderaereo.742
Robindro, K., Clinton, U. B., Hoque, N., & Bhattacharyya, D. K. (2023). JoMIC: A joint MI-based filter feature selection method. Journal of Computational Mathematics and Data Science, Vol. 6, pp. 100075. https://doi.org/10.1016/j.jcmds.2023.100075
Roy, P., Sharmin, S., Ali, A. A., & Shoyaib, M. (2020, May). Discretization and feature selection based on bias corrected mutual information considering high-order dependencies. Pacific-Asia Conference on Knowledge Discovery and Data Mining, Cham: Springer International Publishing, pp. 830-842. https://doi.org/10.1007/978-3-030-47426-3_64
Sánchez Pérez, A., Liliana, M., & Benavides, C. (2020). Mapa de la situación académica colombiana a partir del análisis de las bases de datos del ICFES.
Schrum, M. L., Hedlund-Botti, E., Moorman, N., & Gombolay, M. C. (2022, March). MIND MELD: Personalized Meta-Learning for Robot-Centric Imitation Learning. 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI), IEEE, pp. 157-165. https://doi.org/10.1109/HRI53351.2022.9889616
Sumi, M. S., & Narayanan, A. (2019, February). Improving classification accuracy using combined filter+ wrapper feature selection technique. 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), IEEE, pp. 1-6. https://doi.org/10.1109/ICECCT.2019.8869518
Thaher, T., Mafarja, M., Turabieh, H., Castillo, P. A., Faris, H., & Aljarah, I. (2021). Teaching learning-based optimization with evolutionary binarization schemes for tackling feature selection problems. IEEE Access, Vol. 9, pp. 41082-41103. https://doi.org/10.1109/ACCESS.2021.3064799
Zhang, L., Fu, L., Wang, T., Chen, C., & Zhang, C. (2023, October). Mutual information-driven multi-view clustering. 32nd ACM International Conference on Information and Knowledge Management, pp. 3268-3277. https://doi.org/10.1145/3583780.3614986
Zou, Z., Zhao, L., Zhang, X., Li, Z., Jin, D., & Luo, T. (2021). MIMF: Mutual Information-Driven Multimodal Fusion. Cognitive Systems and Signal Processing: 5th International Conference, ICCSIP 2020, Revised Selected Papers, Springer Singapore, pp. 142-150. https://doi.org/10.1007/978-981-16-2336-3_13
Cómo citar
Descargas
Descargas
Publicado
Evento
Sección
Licencia
Derechos de autor 2025 Asociación Colombiana de Facultades de Ingeniería - ACOFI

Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial-SinDerivadas 4.0.
| Estadísticas de artículo | |
|---|---|
| Vistas de resúmenes | |
| Vistas de PDF | |
| Descargas de PDF | |
| Vistas de HTML | |
| Otras vistas | |



