Analisis Kualitas Tes dan Butir Soal Integral pada Evaluasi Formatif Matematika Teknik

Lilis Trianingsih

Abstract


Abstrak: Analisis kualitas butir soal pilihan ganda merupakan alat penting untuk mengidentifikasi item yang dapat dipertahankan, direvisi, atau dikeluarkan. Tujuan penelitian adalah menguji kualitas item berdasarkan validitas, reliabilitas, tingkat kesukaran, daya pembeda, dan efek pengecoh. Metode penelitian berfokus pada item analysis dari 35 soal pilihan ganda yang dilakukan untuk 83 mahasiswa Pendidikan Teknik Bangunan. Sampel penelitian adalah total sampling. Analisis statistik menggunakan Microsoft Excel dan IBM SPSS. Teknik pengumpulan data melalui LMS spada.uns.ac.id. Analisis data menggunakan analisis deskriptif kuantitatif. Hasil analisis menghasilkan validitas isi 0,89, konstruk 97,14%, valid.  Reliabilitas Kuder-Richardson 0,876 dan Intraclass Correlation Coefficient 0,880, reliabilitas tinggi. Difficulty index (DIF I) 28 (80%) soal memiliki tingkat kesukaran baik, tiga (8,57%) terlalu sulit, empat (11,43%) terlalu mudah, dan Mean ± SD 61,83% ± 16,61%. Discrimination index (DI) 35 (100%) soal memiliki daya pembeda yang dapat diterima hingga sangat baik dengan Mean±SD 53,90% ± 17,26%. Distractor effectiveness (DE) Mean ± SD 92,86% ± 17,75% dengan 92,86% distraktor fungsional secara keseluruhan. Dari keseluruhan hasil analisis disimpulkan butir soal tes integral pada evaluasi formatif Matematika Teknik memiliki kualitas soal yang baik untuk penilaian kognitif mahasiswa. Penelitian selanjutnya dapat dilakukan investigasi tentang korelasi DIF I, DI, dan DE untuk meningkatkan kualitas item pada bank soal.

Abstract: The item analysis of multiple choice questions (MCQ) is essential for identifying items that can be retained, revised, or removed. The research analyzes items' quality based on validity, reliability, difficulty index, discrimination index, and distractor effectiveness. The study focuses on item analysis of 35 MCQ administered to 83 Building Engineering Education students. The research sample is total sampling—statistical analysis using Microsoft Excel and IBM SPSS—data analysis quantitative descriptive analysis. The study results had a content validity of 0.89 and a construct of 97.14%, which is valid—Kuder-Richardson reliability of 0.876 and Intreclass Correlation Coefficient of 0.880, high reliability. Difficulty index (DIF I) 28 (80%) items had a good difficulty, three (8.57%) were too difficult, four (11.43%) were too easy, Mean ± SD 61.83% ± 16.61%. Discrimination index (DI) 35 (100%) items are acceptable to excellent with Mean ± SD 53.90% ± 17.26%. Distractor effectiveness (DE) Mean ± SD 92.86% ± 17.75% with 92.86% functional distractors. The study concluded that the integral test in the formative evaluation of Engineering Mathematics had good-quality questions for students' cognitive assessment. Further research investigates the correlation of DIF I, DI, and DE to enhance the quality of items in the question bank.


Keywords


analisis kualitas tes; daya pembeda; evaluasi formatif; efektifitas pengecoh; tingkat kesukaran; difficulty index; discrimination index; distractor effectiveness; formative evaluation; item analysi

rticle

References


Aiken, L. R. (1979). Relationships between the item difficulty and discrimination indexes. Educational and Psychological Measurement, 39(4), 821–824. https://doi.org/10.1177/001316447903900415

Aiken, L. R. (1980). Content validity and reliability of single items or questionnaires. Educational and Psychological Measurement, 40(4), 955–959. https://doi.org/10.1177/001316448004000419

Aiken, L. R. (1985). Three coefficients for analyzing the reliability and validity of ratings. Educational and Psychological Measurement, 45(1), 131–142. https://doi.org/10.1177/0013164485451012

Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory, Monterey, CA: Brooks/Cole, 1979. Google Scholar.

Arikunto, S. (2019). Prosedur penelitian suatu pendekatan praktik. In Jakarta: Rineka Cipta. PT Rineka Cipta. Badan Pusat Statistik. (2016). Potret awal tujuan pembangunan berkelanjutan (sustainable development goals) di Indonesia. In Katalog BPS.

Bates, R. (2014). Improving human resources for health planning in developing economies. Human Resource Development International, 17(1), 88–97. https://doi.org/10.1080/13678868.2013.857509

Bennett, N., Borg, W. R., & Gall, M. D. (1984). Educational Research: An Introduction. British Journal of Educational Studies, 32(3), 274–274. https://doi.org/10.2307/3121583

Botterman, L., De Cock, I., Blot, S. I., & Labeau, S. O. (2022). A knowledge test on pressure injury in adult intensive care patients: Development, validation, and item analysis. Journal of Tissue Viability, 31(4), 718–725. https://doi.org/10.1016/j.jtv.2022.08.007

Bruri Triyono, M., Köhler, T., & Trianingsih, L. (2018). Technical working skills of vocational high school students at the interface between digital workplaces and school. An empirical study about construction engineering drawings in Indonesia. Communities in New Media: Research on Knowledge Communities in Science, Business, Education and Public Administration - Proceedings of 21th Conference GeNeMe, 191–200. https://d-nb.info/1233869000/34

Charles Secolsky, D. B. D. (2017). Handbook on Measurement, Assessment, and Evaluation in Higher Education. In C. Secolsky & D. B. Denison (Eds.), Handbook on Measurement, Assessment, and Evaluation in Higher Education (2nd Edition). Routledge. https://doi.org/10.4324/9781315709307

Christian, D. S., Prajapati, A. C., Rana, B. M., & Dave, V. R. (2017). Evaluation of multiple choice questions using item analysis tool: a study from a medical institute of Ahmedabad, Gujarat. International Journal Of Community Medicine And Public Health, 4(6), 1876. https://doi.org/10.18203/2394-6040.ijcmph20172004

Crocker, L., & Algina, James. (2008). Introduction to classical and modern test theory- Procedures for Estimating Reliability. In Harcourt Brace Jovanovich College.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334. https://doi.org/10.1007/BF02310555

Date, A. P., Borkar, A. S., Badwaik, R. T., Siddiqui, R. A., Shende, T. R., & Dashputra, A. V. (2019). Item analysis as tool to validate multiple choice question bank in pharmacology. International Journal of Basic & Clinical Pharmacology, 8(9), 1999–2003. https://doi.org/10.18203/2319-2003.ijbcp20194106

Davis, M. H., & Harden, R. M. (2003). Competency-based assessment: Making it a reality. In Medical Teacher (pp. 565–568). https://doi.org/10.1080/0142159032000153842

Garvin, A. D., & Ebel, R. L. (1980). Essentials of Educational Measurement. Educational Researcher, 9(9), 21. https://doi.org/10.2307/1175572

Hansen, J. D., & Dexter, L. (1997). Quality Multiple-Choice Test Questions: Item-Writing Guidelines and an Analysis of Auditing Testbanks. Journal of Education for Business, 73(2), 94–97. https://doi.org/10.1080/08832329709601623

Hattie, J., & Timperley, H. (2007). The power of feedback. In Review of Educational Research (Vol. 77, Issue 1). Sage. https://doi.org/10.3102/003465430298487

Hicks, N. A. (2014). Establishing the validity and reliability of the Fairness of Items Tool. ProQuest Dissertations and Theses, 282. https://hybridlogin.monash.edu/

Hotiu, A. (2016). The relationship between item difficulty and discrimination indices in multiple-choice tests in a Physical science course . Florida Atlantic University.

Ingale, A. S., A. Giri, P., & Doibale, M. K. (2017). Study on item and test analysis of multiple choice questions amongst undergraduate medical students. International Journal Of Community Medicine And Public Health, 4(5), 1562–1565. https://doi.org/10.18203/2394-6040.ijcmph20171764

Karim, S. A., Sudiro, S., & Sakinah, S. (2021). Utilizing test items analysis to examine the level of difficulty and discriminating power in a teacher-made test. EduLite: Journal of English Education, Literature and Culture, 6(2), 256–269. https://doi.org/10.30659/e.6.2.256-269

Kelley, T. L. (1939). The selection of upper and lower groups for the validation of test items. Journal of Educational Psychology, 30(1), 17–24. https://doi.org/10.1037/h0057123

Kiat, J. E., Ong, A. R., & Ganesan, A. (2018). The influence of distractor strength and response order on MCQ responding. Educational Psychology, 38(3), 368–380. https://doi.org/10.1080/01443410.2017.1349877

Kothandaraman, M., & Pachaiyappan, A. (2013). Comparison of Independent Component Analysis techniques for Acoustic Echo Cancellation during Double Talk scenario. Australian Journal of Basic and Applied Sciences, 7(4), 108–113.

Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test reliability. Psychometrika, 2(3), 151–160. https://doi.org/10.1007/BF02288391 Kumar, D., Jaipurkar, R., Shekhar, A., Sikri, G., & Srinivas, V. (2021). Item analysis of multiple choice questions: A quality assurance test for an assessment tool. Medical Journal Armed Forces India, 77(1), S85–S89. https://doi.org/10.1016/j.mjafi.2020.11.007

Kumar Namdeo, S., & Dev Rout, S. (2016). Assessment of Functional and Nonfunctional Distracter in an Item Analysis. International Journal of Contemporary Medical Research ISSN, 3(1), 1891–1893.

Mardapi, D. (2017). Pengukuran, Penilaian, dan Evaluasi Pendidikan. Academia Edu, 7(2), 107–115.

Masters, K. (1996). Designing and Managing Multiple Choice Questions. University of Cape Town, South Africa.

McCoubrie, P. (2004). Improving the fairness of multiple-choice questions: a literature review. Medical Teacher, 26(8), 709–712. https://doi.org/10.1080/01421590400013495

Monariska, E.-. (2019). Analisis kesulitan belajar mahasiswa pada materi integral. Jurnal Analisa, 5(1), 9–19. https://doi.org/10.15575/ja.v5i1.4181

Muaja, J. R. T., Setiawan, A., & Mahatma, T. (2013). Uji validitas dan uji reliabilitas menggunakan metode bootstrap. Prosiding Seminar Nasional Penelitian, Pendidikan Dan Penerapan MIPA, Fakultas MIPA, Universitas Negeri Yogyakarta, 513–519. https://www.researchgate.net/publication/301558948

Pande, S. S., Pande, S. R., Parate, V. R., Nikam, A. P., & Agrekar, S. H. (2013). Correlation between difficulty & discrimination indices of MCQs in formative exam in Physiology. South-East Asian Journal of Medical Education, 7(1), 45–50. https://doi.org/10.4038/seajme.v7i1.149

Pavlova, M. (2014). TVET as an important factor in country’s economic development. SpringerPlus. https://doi.org/10.1186/2193-1801-3-S1-K3

Popham, W. J. (1999). Modern educational measurement. Practical Guidelines for the Education Leader. Michigan: Pearson, 35–90.

Purwanto. (2014). Evaluasi Hasil Belajar. Pustaka Pelajar. Quaigrain, K., & Arhin, A. K. (2017). Using reliability and item analysis to evaluate a teacher-developed test in educational measurement and evaluation. Cogent Education, 4(1), 1–11. https://doi.org/10.1080/2331186X.2017.1301013

Rae, G. (1978). Measurement and Evaluation in Psychology and Education (4th Ed.), R. L. Thorndike and E. P. Hagen (Wiley, 1977) pp. viii plus 693, £11.75. Scottish Educational Review, 10(2), 69–71. https://doi.org/10.1163/27730840-01002012

Rao, C., Kishan Prasad, H., Sajitha, K., Permi, H., & Shetty, J. (2016). Item analysis of multiple choice questions: Assessing an assessment tool in medical students. International Journal of Educational and Psychological Researches, 2(4), 201–204. https://doi.org/10.4103/2395-2296.189670

Retnawati, H. (2016). Proving content validity of self-regulated learning scale (The comparison of Aiken index and expanded Gregory index). Research and Evaluation in Education, 2(2), 155. https://doi.org/10.21831/reid.v2i2.11029

Retnawati, H. (2017). Reliabilitas Instrumen Penelitian. Jurnal Pendidikan Teknik Mesin Unnes, 12(1). http://staffnew.uny.ac.id/upload/132255129/pengabdian/8 Reliabilitas3 alhamdulillah.pdf

Rudner, L. M. (1995). Questions to ask when evaluating tests. Practical Assessment, Research and Evaluation, 4(2).

Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18(2), 119–144. https://doi.org/10.1007/BF00117714

Sajjad, M., Iltaf, S., & Khan, R. A. (2020). Nonfunctional distractor analysis: An indicator for quality of multiple choice questions. Pakistan Journal of Medical Sciences, 36(5), 982–986. https://doi.org/10.12669/pjms.36.5.2439

Salim, E. (2018). Tujuan Pembangunan Berkelanjutan di Indonesia: Konsep, Target dan Strategi Implementasi. 感染症誌, 91, 399–404.

Saputra, H. D., Purwanto, W., Setiawan, D., Fernandez, D., & Putra, R. (2022). Hasil Belajar Mahasiswa: Analisis Butir Soal Tes. Edukasi: Jurnal Pendidikan, 20(1), 15–27. https://doi.org/10.31571/edukasi.v20i1.3432

Schneider, K. C., & Kerlinger, F. N. (1979). Behavioral Research: A Conceptual Approach. Journal of Marketing Research, 16(4), 599–600. https://doi.org/10.2307/3150838

Setiawan, A. H. (2015). The Contribution of the Vocational Teachers Professional Competence toward Vocational High Schools Performance. Proceedings of the 3rd UPI International Conference on Technical and Vocational Education and Training, 1–6. https://doi.org/10.2991/ictvet-14.2015.1

Setiawan, A. H. (2022). Enhancing collaborative mindset by blended online learning platform in a civil engineering education course. Journal of East Asian Studies, 20(3), 1–35. http://petit.lib.yamaguchi-u.ac.jp/28854/files/165507

Setiawan, A. H., & Takaoka, R. (2020). Designing PBL steps in vocational course based on students’ readiness and teachers’ discussion. In Mashoedah, I. Hidayatulloh, N. Hidayat, & I. W. Djatmiko (Eds.), Journal of Physics: Conference Series. IOP. https://doi.org/10.1088/1742-6596/1456/1/012045

Setiawan, A. H., Takaoka, R., Tamrin, A., Roemintoyo, Murtiono, E. S., & Trianingsih, L. (2021). Contribution of collaborative skill toward construction drawing skill for developing vocational course. Open Engineering, 11, 755–771. https://doi.org/10.1515/eng-2021-0073

Setiawan, A. H., Takaoka, R., & Trianingsih, L. (2020). Investigation of Vocational Students’ Skills for Determining Learning Experiences on CAD Construction Drawing Course. In H. Mitsuhara, Y. Goda, Y. Ohashi, Ma. M. T. Rodrigo, J. Shen, N. Venkatarayalu, G. Wong, M. Yamada, & C.-U. Le (Eds.), IEEE International Conference on Engineering, Technology and Education, TALE (pp. 748–753). IEEE. https://doi.org/10.1109/TALE48869.2020.9368338

Shakurnia, A., Ghafourian, M., Khodadadi, A., Ghadiri, A., Amari, A., & Shariffat, M. (2022). Evaluating Functional and Non-Functional Distractors and Their Relationship with Difficulty and Discrimination Indices in Four-Option Multiple-Choice Questions. Education in Medicine Journal, 14(4), 55–62. https://doi.org/10.21315/eimj2022.14.4.5

Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420–428. https://doi.org/10.1037/0033-2909.86.2.420

Sonnadara, R., McQueen, S., Mironova, P., Safir, O., Nousiainen, M., Ferguson, P., Alman, B., Kraemer, W., & Reznick, R. (2013). Reflections on current methods for evaluating skills during joint replacement surgery. The Bone & Joint Journal, 95-B(11), 1445–1449. https://doi.org/10.1302/0301-620X.95B11.30732

Suryadevara, V. K., & Bano, Z. (2018). Item analysis to identify quality multiple choice questions/items in an assessment in Pharmacology of II MBBS students in Guntur Medical College of Andhra Pradesh, India. International Journal of Basic & Clinical Pharmacology, 7(8), 1517–1521. https://doi.org/10.18203/2319-2003.ijbcp20183004 Thorndike, R. M., Cunningham, G. K.,

Thorndike, R. L., & Hagen, E. P. (1991). Measurement and evaluation in psychology and education, 5th ed. In Measurement and evaluation in psychology and education, 5th ed.

Towip, Widiastuti, I., Saputra, T. W., Noviansyah, W., & Trianingsih, L. (2021). TVET Institutions’ Perspective on Implementation of Public-Private Partnerships Model in the Southeast Asia Countries. IOP Conference Series: Earth and Environmental Science, 1808(1), 1–9. https://doi.org/10.1088/1742-6596/1808/1/012007

Triyono, M. B., Trianingsih, L., & Nurhadi, D. (2018). Students’ employability skills for construction drawing engineering in Indonesia. World Transactions on Engineering and Technology Education, 16(1), 29–35.

Wagner, N., Acai, A., McQueen, S. A., McCarthy, C., McGuire, A., Petrisor, B., & Sonnadara, R. R. (2019). Enhancing Formative Feedback in Orthopaedic Training: Development and Implementation of a Competency-Based Assessment Framework. Journal of Surgical Education, 76(5), 1376–1401. https://doi.org/10.1016/j.jsurg.2019.03.015

Widhiarso, W. (2007). Mengestimasi Reliabilitas. https://repository.ugm.ac.id/

Widoyoko, E. P. (2014). Penilaian Hasil Pembelajaran di Sekolah (Ratih, Ed.; 1st ed.). Pustaka Pelajar.

Zainal, A. (2009). Evaluasi pembelajaran prinsip, teknik, prosedur (P. Latifah, Ed.; 1st ed.). PT. Remaja Rosdakarya.

Zega. (2019). Penerapan integral dan diferensial pada Mekanika Struktur. http://repository.uhn.ac.id/handle/123456789/3345




DOI: https://doi.org/10.20961/ijcee.v9i2.84711

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Indonesian Journal Of Civil Engineering Education



Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.