Transformative learning based on numeracy assessment in Islamic boarding schools: Item response theory

Rosid Bahar, Syafrudin Syafrudin

Abstract

Numeracy skills are a key competency in addressing 21st-century challenges, includ-ing in religious-based educational environments such as Islamic boarding schools (pesantren). This study aims to measure students' numeracy skills through a transformative learning approach based on numeracy assessment. The research employs a descriptive quantitative method involving 383 students from four Islamic boarding schools located in different regencies/cities in West Java Province: Purwakarta, Tasikmalaya, and Sukabumi. The assessment instrument was developed based on the Minimum Competency Assessment (AKM) framework and underwent content validity testing involving experts, construct validity testing using Exploratory Factor Analysis (EFA), and reliability testing with the Item Response Theory (IRT) approach. The analysis results indicate that the Rasch (1PL) model is the most suitable for measuring students' numeracy skills, with item difficulty as the primary indicator. The ability distribution shows that 21.67% of students are in the advanced category, 27.68% proficient, 27.42% basic, and 23.24% require special intervention. These findings suggest that while some students already possess strong numeracy skills, learning reinforcement remains necessary. Overall, this study confirms that numeracy assessment-based transformative learning is highly feasible to implement in pesantren environments, as it offers a contextual, equitable, and data-driven learning approach.

Keywords

Islamic Boarding School; Numeracy; Minimum Competency Assessment (MCA); Item Response Theory (IRT)

References

Abal, F. J. P., Sánchez González, J. F., & Attorresi, H. F. (2024). Adaptation of the Bergen Instagram addiction scale in Argentina: calibration with item response theory. Current Psychology, 43(2), 1794–1805. https://doi.org/10.1007/s12144-023-04257-1

Amusa, J. O., Ayanwale, M. A., Oladejo, A. I., & Ayedun, F. (2022). Undergraduate physics test dimensionality and conditional independence: Perspective from latent traits model package of R language. The International Journal of Assessment and Evaluation, 29(2), 47–61. https://doi.org/10.18848/2327-7920/CGP/v29i02/47-61

Andriatna, R., Sujadi, I., Budiyono, Kurniawati, I., Wulandari, A. N., & Puteri, H. A. (2024). Junior high school students’ numeracy in geometry and measurement content: Evidence from the minimum competency assessment result. AIP Conference Proceedings, 3046(1), 020036. https://doi.org/10.1063/5.0194570

Andrich, D. (1988). Rasch models for measurement: Sage publications. Sage Publications.

Ariyanti, L., Hanurawan, F., & Ramli, M. (2024). Development of problem mind mapping-based learning model (PMM-BL) integration patterns to improve critical thinking skills in primary school students. Edelweiss Applied Science and Technology, 8(6), 2905–2919. https://doi.org/10.55214/25768484.v8i6.2631

Bond, T. G., & Fox, C. M. (2007). Applying the Rasch model: Fundamental measurement in the human sciences. Psychology Press.

Boone, W. J., Yale, M. S., & Staver, J. R. (2014). Rasch analysis in the human sciences. Springer. https://doi.org/10.1007/978-94-007-6857-4

Buchholz, S. W. (2021). Quantitative designs for practice scholarship. In Research for Advanced Practice Nurses, Fourth Edition: From Evidence to Practice (pp. 143–172). https://doi.org/10.1891/9780826151339.0009

DeMars, C. E. (2018). Classical test theory and item response theory. In The Wiley Handbook of Psychometric Testing (pp. 49–73). John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118489772.ch2

Desjardins, C. D., & Bulut, O. (2018). Handbook of educational measurement and psychometrics using R. CRC Press.

Dewi, M. P., & Wajdi, M. B. N. (2023). Pesantren Laws as an Accelerator for Community Economic Development. EDUTEC : Journal of Education And Technology, 6(4), 290–298. https://doi.org/10.29062/edu.v6i4.772

Diale, B. M., Victor-Aigbodion, V., & Eseadi, C. (2023). Transformative assessment in digitalised post-COVID-19 education: Implications for higher education teachers. In Fostering Diversity and Inclusion Through Curriculum Transformation (pp. 101–112). https://doi.org/10.4018/978-1-6684-6995-8.ch006

Díez-Palomar, J., Ramis-Salas, M., Močnik, I., Simonič, M., & Hoogland, K. (2023). Challenges for numeracy awareness in the 21st century: making visible the invisible. Frontiers in Education, 8. https://doi.org/10.3389/feduc.2023.1295781

Ekstrand, J., Westergren, A., Årestedt, K., Hellström, A., & Hagell, P. (2022). Transformation of Rasch model logits for enhanced interpretability. BMC Medical Research Methodology, 22(1), 332. https://doi.org/10.1186/s12874-022-01816-1

Ellis, J. L. (2013). A standard for test reliability in group research. Behavior Research Methods, 45(1), 16–24. https://doi.org/10.3758/s13428-012-0223-z

England, A. (2021). Quantitative and qualitative research methods. In Research for medical imaging and radiation sciences (pp. 71–96). Springer International Publishing. https://doi.org/10.1007/978-3-030-79956-4_5

Estrada-Mejia, C., de Vries, M., & Zeelenberg, M. (2016). Numeracy and wealth. Journal of Economic Psychology, 54, 53–63. https://doi.org/10.1016/j.joep.2016.02.011

Gashaj, V., Thaqi, Q., Mast, F. W., & Roebers, C. M. (2023). Foundations for future math achievement: Early numeracy, home learning environment, and the absence of math anxiety. Trends in Neuroscience and Education, 33, 1–18. https://doi.org/10.1016/j.tine.2023.100217

Goos, M., Bennison, A., Forgasz, H., & Yasukawa, K. (2024). Research in numeracy education. In Research in Mathematics Education in Australasia 2020–2023 (pp. 59–84). Springer Nature Singapore. https://doi.org/10.1007/978-981-97-1964-8_4

Haberman, S. J. (2007). The information a test provides on an ability parameter. ETS Research Report Series, 2007(1), i–16. https://doi.org/10.1002/j.2333-8504.2007.tb02060.x

Hair, J. F., Black, W. C., & Babin, B. J. (2010). Multivariate data analysis: A global perspective (7th ed.). Prentice Hall.

Halupa, C. M. (2015). Transformative learning: Theory and practice for faculty and students. In Transformative Curriculum Design in Health Sciences Education (pp. 1–39). https://doi.org/10.4018/978-1-4666-8571-0.ch001

Horner, R. H., Todd, A. W., Lewis-Palmer, T., Irvin, L. K., Sugai, G., & Boland, J. B. (2004). The school-wide evaluation tool (SET). Journal of Positive Behavior Interventions, 6(1), 3–12. https://doi.org/10.1177/10983007040060010201

Hunde, A. B., Abate, M. T., & Wedajo, A. L. (2025). Lesson study as a tool for improving teachers’ transformative assessment practices. Sage Open, 15(2). https://doi.org/10.1177/21582440251333483

Iqbal, J., & Ul Islam, T. (2024). Comparison of statistical models for individual’s ability index and ranking. Educational Research and Evaluation, 29(3–4), 171–185. https://doi.org/10.1080/13803611.2024.2325454

Kean, J., Bisson, E. F., Brodke, D. S., Biber, J., & Gross, P. H. (2018). An introduction to item response theory and Rasch analysis: Application using the eating assessment tool (EAT-10). Brain Impairment, 19(1), 91–102. https://doi.org/10.1017/BrImp.2017.31

Khoirurrijal, M. F., Karim, A. R., Zaini, A., & Salik, M. (2023). Pesantren and the human development index in Indonesia Post Law Number 18 of 2019. Santri: Journal of Pesantren and Fiqh Sosial, 4(1), 45–66. https://doi.org/10.35878/santri.v4i1.697

Lim, H., & Wells, C. S. (2022). [RETRACTED ARTICLE] irtplay : An R Package for unidimensional item response theory modeling. Journal of Statistical Software, 103(12). https://doi.org/10.18637/jss.v103.i12

Lloret-Segura, S., Ferreres-Traver, A., Hernández-Baeza, A., & Tomás-Marco, I. (2014). El análisis factorial exploratorio de los ítems: una guía práctica, revisada y actualizada. Anales de Psicología, 30(3), 1151–1169. https://doi.org/10.6018/analesps.30.3.199361

Mayer, J. D., & Bryan, V. M. (2024). On personality measures and their data: A classification of measurement approaches and their recommended uses. Personality and Social Psychology Review, 28(3), 325–345.

Meguellati, S. (2024). A critical analysis of the use of classical test theory (CTT) in psychological testing: A comparison with item response theory (IRT). Pakistan Journal of Life and Social Sciences (PJLSS), 22(2), 9442–9449. https://doi.org/10.57239/PJLSS-2024-22.2.00715

Ministry of Education and Culture. (2021). Framework asesmen kompetensi minimum.

Mohd Salleh, K., Sulaiman, N. L., & Gloeckner, G. (2023). Exploring test concept and measurement through validity and reliability process inTVET research: Guideline for the novice researcher. Journal of Technical Education and Training, 15(1), 257–264. https://doi.org/10.30880/jtet.2023.15.01.022

Na, C., Clarke-Midura, J., Shumway, J., van Dijk, W., & Lee, V. R. (2024). Validating a performance assessment of computational thinking for early childhood using item response theory. International Journal of Child-Computer Interaction, 40. https://doi.org/10.1016/j.ijcci.2024.100650

Nima, A. Al, Cloninger, K. M., Persson, B. N., Sikström, S., & Garcia, D. (2020). Validation of subjective well-being measures using item response theory. Frontiers in Psychology, 10(January), 1–33. https://doi.org/10.3389/fpsyg.2019.03036

Nuha, M. F. A. U., Muklason, A., & Agustiawan, Y. (2024). Enhancing administrative efficiency in pondok pesantren: Exploring the acceptance of E-Santren app system for administrative tasks. Procedia Computer Science, 234, 795–804. https://doi.org/10.1016/j.procs.2024.03.096

Nunnally, J. C. (1975). Psychometric theory— 25 years ago and now. Educational Researcher, 4(10), 7–21. https://doi.org/10.3102/0013189X004010007

Organization for Economic Co-operation and Development. (2019). PISA 2018 assessment and analytical framework. OECD. https://doi.org/10.1787/b25efab8-en

Peterson, R. A., & Kim, Y. (2013). On the relationship between coefficient alpha and composite reliability. Journal of Applied Psychology, 98(1), 194–198. https://doi.org/10.1037/a0030767

Puglisi, D., & Domènech-Gil, G. (2023). Enabling lifelong learning by using multiple engagement tools. Proceedings of the International CDIO Conference, 633–643.

Purnomo, H., Sa’dijah, C., Hidayanto, E., Sisworo, S., Permadi, H., & Anwar, L. (2022). Development of instrument numeracy skills test of minimum competency assessment (MCA) in Indonesia. International Journal of Instruction, 15(3), 635–648. https://doi.org/10.29333/iji.2022.15335a

Rodríguez, S. P., van der Velden, R., Huijts, T., & Jacobs, B. (2024). Identifying literacy and numeracy skill mismatch in OECD countries using the job analysis method. Oxford Economic Papers, 76(3), 859–876. https://doi.org/10.1093/oep/gpad045

Roebianto, A., Savitri, S. I., Aulia, I., Suciyana, A., & Mubarokah, L. (2023). Content validity: Definition and procedure of content validation in psychological research. TPM - Testing, Psychometrics, Methodology in Applied Psychology, 30(1), 5–18. https://doi.org/10.4473/TPM30.1.1

Sa’dijah, C., Purnomo, H., Abdullah, A. H., Permadi, H., Anwar, L., Cahyowati, E. T. D., & Sa’diyah, M. (2023). Students’ numeracy skills in solving numeracy tasks: Analysis of students of junior high schools. AIP Conference Proceedings, 2569, 040011. https://doi.org/10.1063/5.0113664

Santos, G. A., & Pechliye, M. M. (2023). Aprendizagem numa perspectiva reflexiva e transformadora em um curso de ciências biológicas: Evidências do processo. CICIC 2023 - Decima Tercera Conferencia Iberoamericana de Complejidad, Informatica y Cibernetica En El Contexto de the 14th International Multi-Conference on Complexity, Informatics, and Cybernetics, IMCIC 2023 - Memorias, 91–94. https://doi.org/10.54808/CICIC2023.01.91

Santoso, A., Retnawati, H., Pardede, T., Apino, E., Rafi, I., Rosyada, M. N., Kassymova, G. K., & Wenxin, X. (2024). From investigating the alignment of a priori item characteristics based on the CTT and four-parameter logistic (4-PL) IRT models to further exploring the comparability of the two models. Practical Assessment, Research and Evaluation, 29(14), 1–28. https://doi.org/10.7275/pare.2043

Selvianiresa, D., & Prabawanto, S. (2017). Contextual teaching and learning approach of mathematics in primary schools. Journal of Physics: Conference Series, 895(1), 012171. https://doi.org/10.1088/1742-6596/895/1/012171

Sen, S., & Cohen, A. S. (2024). An evaluation of fit indices used in model selection of dichotomous mixture IRT models. Educational and Psychological Measurement, 84(3), 481–509. https://doi.org/10.1177/00131644231180529

Sideridis, G., & Alahmadi, M. (2022). Estimation of person ability under rapid and effortful responding. Journal of Intelligence, 10(3), 67. https://doi.org/10.3390/jintelligence10030067

Sinharay, S. (2015). The asymptotic distribution of ability estimates. Journal of Educational and Behavioral Statistics, 40(5), 511–528. https://doi.org/10.3102/1076998615606115

Siswaningsih, W., Susetyo, B., Ariesta, A. S., & Rahmawati, T. (2023). Implementation of minimum competency assessment (MCA) containing ethnoscience on the topic of electrolyte and non-electrolyte solutions. AIP Conference Proceedings, 2642. https://doi.org/10.1063/5.0113856

Sosa, M. M., García, M. R., & Piña, R. U. (2010). R: A not much spread and very useful tool for clinical research. Revista Cubana de Investigaciones Biomedicas, 29(2).

Sugiarta, I. M., Ariawan, I. P. W., Ardana, I. M., Divayana, D. G. H., Sukawijana, I. K. G., & Sugiharni, G. A. D. (2024). Validity and reliability of the discrepancy evaluation instrument for measuring inequality in the online learning. International Journal of Evaluation and Research in Education (IJERE), 13(6), 3952. https://doi.org/10.11591/ijere.v13i6.28106

Tang, H., & Ji, P. (2014). Using the statistical Program R instead of SPSS to analyze data. In ACS Symposium Series (Vol. 1166, pp. 135–151). https://doi.org/10.1021/bk-2014-1166.ch008

Tennant, A., & Conaghan, P. G. (2007). The Rasch measurement model in rheumatology: What is it and why use it? When should it be applied, and what should one look for in a Rasch paper? Arthritis Care & Research, 57(8), 1358–1362. https://doi.org/10.1002/art.23108

Thomas, M. L. (2011). The value of item response theory in clinical assessment: a review. Assessment, 18(3), 291–307. https://doi.org/10.1177/1073191110374797

Tout, D. (2020). Evolution of adult numeracy from quantitative literacy to numeracy: Lessons learned from international assessments. International Review of Education, 66(2–3), 183–209. https://doi.org/10.1007/s11159-020-09831-4

Trendafilov, N., & Hirose, K. (2022). Exploratory factor analysis. In International Encyclopedia of Education: Fourth Edition (pp. 600–606). https://doi.org/10.1016/B978-0-12-818630-5.10015-6

Us, K. A., Musyaffa, A. A., & Ilyas. (2023). The quality of education in Islamic boarding schools in the revolutionary era 4.0 in middle of Covid-19 (Study of Islamic boarding schools in Jambi City). AIP Conference Proceedings, 2805(1), 040001. https://doi.org/10.1063/5.0168158

Velec, M., & Huang, S. H. (2014). Quantitative methodologies and analysis. In Research for the Radiation Therapist: From Question to Culture (pp. 87–126).

Wang, F., & Sahid, S. (2024). Content validation and content validity index calculation for entrepreneurial behavior instruments among vocational college students in China. Multidisciplinary Reviews, 7(9), 2024187. https://doi.org/10.31893/multirev.2024187

Wang, V. X. (2018). Critical theory and transformative learning. In Critical Theory and Transformative Learning. https://doi.org/10.4018/978-1-5225-6086-9

Yao, G., Wu, C., & Yang, C. (2008). Examining the content validity of the WHOQOL-BREF from respondents’ perspective by quantitative methods. Social Indicators Research, 85(3), 483–498. https://doi.org/10.1007/s11205-007-9112-8

Yildiz, H. (2021). IRTGUI: An R Package for unidimensional item response Theory analysis with a graphical user interface. Applied Psychological Measurement, 45(7–8), 551–552. https://doi.org/10.1177/01466216211040532

Zaqiah, Q. Y., Hasanah, A., & Heryati, Y. (2024). The role of steam education in improving student collaboration and creativity: A case study in Madrasah. Jurnal Pendidikan Islam, 10(1), 101–112. https://doi.org/10.15575/jpi.v10i1.35207

Zhu, P., Chen, C.-C., Wang, Q., Luke, M. M., & Liu, Y. (2025). Item response theory analysis and measurement invariance testing of the cultural humility and enactment scale. Measurement and Evaluation in Counseling and Development, 58(1), 27–46. https://doi.org/10.1080/07481756.2024.2344000

Zizler, P., & Ittyipe, S. (2023). On nonnegative loading matrices: Two-factor case. Applied Mathematics E - Notes, 23, 477–483.

Refbacks

  • There are currently no refbacks.