Mutual information-driven feature selection for Saber Pro test analysis: identifying key socioeconomic and academic predictors of student performance

Autores/as

DOI:

https://doi.org/10.26507/paper.4775

Palabras clave:

mutual information, educational equity, Saber Pro tests, predictive analytics, socioeconomic determinants, Colombia

Resumen

The Saber-Pro tests, designed to evaluate educational quality in Colombia, pose a significant challenge due to the diversity and complexity of factors influencing academic performance. This study presents a preliminary analysis of socioeconomic and academic variables with the aim of identifying those most relevant to explaining student achievement. To this end, the Mutual Information (MI) method is employed—a statistical technique that measures the dependency between two variables, indicating how much information about one can be inferred from the other. MI proves particularly useful in this context as it enables the prioritisation and ranking of variables based on their relevance in predicting academic performance. The analysis is applied to data from students of Uniminuto Virtual who took the Saber-Pro exams during the 2021–2023 period, evaluating multiple factors associated with participants’ socioeconomic and academic backgrounds. Among the most prominent variables are university tuition fees, academic programme, and geographic location. These were found to have a significant relationship with test scores, highlighting their key role in academic performance. The statistical approach adopted not only identifies the most relevant features but also provides a solid foundation for future studies aiming to implement advanced predictive models. Feature selection through mutual information contributes to optimising data analysis by reducing model complexity, minimising overfitting risks, and improving prediction accuracy. Moreover, this initial analysis is essential for ensuring that predictive models are efficient and capable of addressing the heterogeneity of educational data.

Citas

Ali, A., Jillani, F., Zaheer, R., Karim, A., Alharbi, Y. O., Alsaffar, M., & Alhamazani, K. (2022). Practically Implementation of Information Loss: Sensitivity, Risk by Different Feature Selection Techniques. IEEE Access, Vol. 10, pp. 27643-27654. https://doi.org/10.1109/ACCESS.2022.3152963

Bennasar, M., Hicks, Y., & Setchi, R. (2015). Feature selection using joint mutual information maximisation. Expert Systems with Applications, Vol. 42, No. 22, pp. 8520-8532. https://doi.org/10.1016/j.eswa.2015.07.007

Bhukya, R. (2025, March). Normalized Mutual Information-Driven Feature Extraction Method for Big Data Analytics. International Conference on Power Engineering and Intelligent Systems (PEIS), Singapore, Springer Nature Singapore, pp. 249-261. https://doi.org/10.1007/978-981-97- 6710-6_20

Boroumand, S., Bouganis, C.-S., & Constantinides, G. A. (2022, July). MIDAS: Mutual Information Driven Approximate Synthesis. 2022 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), IEEE, pp. 50-55. https://doi.org/10.1109/ISVLSI54635.2022.00022

Cheng, J., Sun, J., Yao, K., Xu, M., & Cao, Y. (2022). A variable selection method based on mutual information and variance inflation factor. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, Vol. 268, pp. 120652. https://doi.org/10.1016/j.saa.2021.120652

Cote-Ballesteros, J. E., Grisales Palacios, V. H., & Rodriguez-Castellanos, J. E. (2022). A Hybrid Approach Variable Selection Algorithm Based on Mutual Information for Data-Driven Industrial Soft-Sensor Applications. Ciencia e Ingeniería Neogranadina, Vol. 32, No. 1, pp. 59-70. https://doi.org/10.18359/rcin.5644

Hoque, N., Bhattacharyya, D. K., & Kalita, J. K. (2014). MIFS-ND: A mutual information-based feature selection method. Expert Systems with Applications, Vol. 41, No. 14, pp. 6371-6385. https://doi.org/10.1016/j.eswa.2014.04.019

ICFES. (2018). Documentación del examen Saber PRO Contenido. Ministerio de Educación Nacional. Consultado en: https://www.icfes.gov.co/documents/20143/518352/Documentacion+saber+pro.pdf

Islam, M. R., Ahmed, B., Hossain, M. A., & Uddin, M. P. (2023). Mutual information-driven feature reduction for hyperspectral image classification. Sensors, Vol. 23, No. 2, pp. 657. https://doi.org/10.3390/s23020657

Jeon, E., Ko, W., Yoon, J. S., & Suk, H. I. (2021). Mutual information-driven subject-invariant and class-relevant deep representation learning in BCI. IEEE Transactions on Neural Networks and Learning Systems, Vol. 34, No. 2, pp. 739-749. https://doi.org/10.1109/TNNLS.2021.3100583

Liu, S., & Motani, M. (2025). Improving Mutual Information based Feature Selection by Boosting Unique Relevance. Journal of Artificial Intelligence Research, Vol. 82, pp. 1267-1292. https://doi.org/10.1613/jair.1.17219

Liu, Y. (2004). A comparative study on feature selection methods for drug discovery. Journal of Chemical Information and Computer Sciences, Vol. 44, No. 5, pp. 1823-1828. https://doi.org/10.1021/ci049875d

Medina, J. E. C., Benavides, J. A. C., & Correa, L. Á. F. (2020). Análisis de los resultados de las Pruebas Saber Pro en estudiantes de la licenciatura en Educación Básica de la Universidad Pedagógica y Tecnológica de Colombia (UPTC). Plumilla Educativa, Vol. 25, No. 1, pp. 125-151. https://doi.org/10.30554/pe.1.3833.2020

Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 8, pp. 1226-1238. https://doi.org/10.1109/TPAMI.2005.159

Quintero, J. P. C. (2022). Comportamiento del uso de datos abiertos en Colombia (2016-2021). Ciencia y Poder Aéreo, Vol. 17, No. 1, pp. 137-149. https://doi.org/10.18667/cienciaypoderaereo.742

Robindro, K., Clinton, U. B., Hoque, N., & Bhattacharyya, D. K. (2023). JoMIC: A joint MI-based filter feature selection method. Journal of Computational Mathematics and Data Science, Vol. 6, pp. 100075. https://doi.org/10.1016/j.jcmds.2023.100075

Roy, P., Sharmin, S., Ali, A. A., & Shoyaib, M. (2020, May). Discretization and feature selection based on bias corrected mutual information considering high-order dependencies. Pacific-Asia Conference on Knowledge Discovery and Data Mining, Cham: Springer International Publishing, pp. 830-842. https://doi.org/10.1007/978-3-030-47426-3_64

Sánchez Pérez, A., Liliana, M., & Benavides, C. (2020). Mapa de la situación académica colombiana a partir del análisis de las bases de datos del ICFES.

Schrum, M. L., Hedlund-Botti, E., Moorman, N., & Gombolay, M. C. (2022, March). MIND MELD: Personalized Meta-Learning for Robot-Centric Imitation Learning. 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI), IEEE, pp. 157-165. https://doi.org/10.1109/HRI53351.2022.9889616

Sumi, M. S., & Narayanan, A. (2019, February). Improving classification accuracy using combined filter+ wrapper feature selection technique. 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), IEEE, pp. 1-6. https://doi.org/10.1109/ICECCT.2019.8869518

Thaher, T., Mafarja, M., Turabieh, H., Castillo, P. A., Faris, H., & Aljarah, I. (2021). Teaching learning-based optimization with evolutionary binarization schemes for tackling feature selection problems. IEEE Access, Vol. 9, pp. 41082-41103. https://doi.org/10.1109/ACCESS.2021.3064799

Zhang, L., Fu, L., Wang, T., Chen, C., & Zhang, C. (2023, October). Mutual information-driven multi-view clustering. 32nd ACM International Conference on Information and Knowledge Management, pp. 3268-3277. https://doi.org/10.1145/3583780.3614986

Zou, Z., Zhao, L., Zhang, X., Li, Z., Jin, D., & Luo, T. (2021). MIMF: Mutual Information-Driven Multimodal Fusion. Cognitive Systems and Signal Processing: 5th International Conference, ICCSIP 2020, Revised Selected Papers, Springer Singapore, pp. 142-150. https://doi.org/10.1007/978-981-16-2336-3_13

Cómo citar

[1]
N. Orozco Morales, L. Valderrama García, J. S. Martínez, y C. Rincón Guío, «Mutual information-driven feature selection for Saber Pro test analysis: identifying key socioeconomic and academic predictors of student performance», EIEI ACOFI, sep. 2025.

Descargas

Los datos de descargas todavía no están disponibles.

Descargas

Publicado

08-09-2025
Estadísticas de artículo
Vistas de resúmenes
Vistas de PDF
Descargas de PDF
Vistas de HTML
Otras vistas
QR Code
Crossref Cited-by logo