Grammatical Error Correction (GEC) of Indonesian Text Based on Neural Machine Translation (NMT)

Nike Sartika, Yuda Sukmana

Abstract

Writing errors in Indonesian are often found in various writings made in educational, government and mass media environments. The most dominant error is in spelling. This research proposes a Grammatical Error Correction (GEC) for Indonesian using the Neural Machine Translation (NMT) method, namely seq2seq, which is popularly used for English and has achieved the best performance approaching human capabilities. The model developed is made into a web-based service that is easy for users to access. The datasets used in this experiment are artificial datasets sourced from several studies regarding error analysis in Indonesian. The research results show that with the help of currently available open-source tools such as OpenNMT-py, it is possible to simplify the training process of NMT-based GEC models. Unfortunately, the small number of datasets leads to poor predictions for random sentences.

Full Text:

PDF

References

Purwandari, H. S., Setiawan, B., dan Saddhono, K., "Analisis Kesalahan Berbahasa Indonesia pada Surat Dinas Kantor Kepala Desa Jladri", BASASTRA Jurnal Penelitian Bahasa, Sastra Indonesia dan Pengajarannya, Vol. 1, No. 3, 478–489, April 2014.

Sukmawati, Nurhayati, dan Iswary, E, "Penggunaan Bahasa Indonesia pada Informasi Layanan Umum dan Layanan Niaga di Kota Kendari", Jurnal Bahasa dan Sastra, 2(1), 3–4, 2013.

Ariningsih, N., Sumarwati, S., dan Saddhono, K., "Analisis Kesalahan Berbahasa Indonesia Dalam Karangan Eksposisi Siswa Sekolah Menengah Atas", Jurnal Penelitian Bahasa, Sastra Indonesia, dan Pengajarannya, 1(1), 130–141, 2012.

Khoirurrohman Taufiq, "Analisis Kesalahan Ejaan Dalam Karangan Siswa Kelas 3 Sdn Ketug Kecamatan Butuh Tahun Pelajaran 2017/2018", Jurnal Dialektika Jurusan PGSD, 8(2), 70–77, 2018.

Qhadafi, M. R., "Analisis Kesalahan Penulisan Ejaan yang Disempurnakan dalam Teks Negosiasi Siswa SMA Negeri 3 Palu", Jurnal Bahasa dan Sastra, 3(4), 1–21, 2018.

Asih, A., Tantri, S., dan Sutresna, I. B., "Kesalahan Penggunaan Ejaan Bahasa Indonesia dalam Makalah sebagai Alternatif Materi Ajar Ejaan Bahasa Indonesia (EBI)", Prosiding Seminar Nasional V: Bahasa, Sastra, dan Pengajarannya, diperoleh melalui situs internet: https://eproceeding.undiksha.ac.id/index.php/semnasbasindo, 191–199. 2018.

Leksono, M. L., "Analisis Kesalahan Penggunaan Pedoman Ejaan Bahasa Indonesia (PUEBI) Pada Tugas Makalah dan Laporan Praktikum Mahasiswa IT Telkom Purwokerto", JP-BSI (Jurnal Pendidikan Bahasa dan Sastra Indonesia), 4(2), 116. https://doi.org/10.26737/jp-bsi.v4i2.1106. 2019.

Rosdiana, L. A., "Kesalahan Penggunaan Ejaan Bahasa Indonesia (EBI) Pada Karya Ilmiah Mahasiswa, Bahtera Indonesia; Jurnal Penelitian Bahasa dan Sastra Indonesia, 5(1), 1–11. https://doi.org/10.31943/bi.v5i1.58, 2020.

Turistiani, T. D., "Fitur Kesalahan Penggunaan Ejaan Yang Disempurnakan Dalam Makalah Mahasiswa, Paramasastra", 1(1), 61–72. https://doi.org/10.26740/parama.v1i1.1470, 2014.

Winata, N. T., "Analisis Kesalahan Ejaan Bahasa Indonesia Dalam Media Massa Daring (Detikcom), Bahtera Indonesia", Jurnal Penelitian Bahasa dan Sastra Indonesia, 4(2), 115–121. https://doi.org/10.31943/bi.v4i2.52, 2019.

Fahda, A., dan Purwarianti, A., "A statistical and rule-based spelling and grammar checker for Indonesian text", Proceedings of 2017 International Conference on Data and Software Engineering, ICoDSE 2017, 2018-January, 1–6. https://doi.org/10.1109/ICODSE.2017.8285846, 2018.

Qiu, Z., dan Qu, Y., "A Two-Stage Model for Chinese Grammatical Error Correction", IEEE Access, 7, 146772–146777. https://doi.org/10.1109/ACCESS.2019.2940607, 2019.

Zaky, D., and Romadhony, A., "An LSTM-based Spell Checker for Indonesian Text", Proceedings - 2019 International Conference on Advanced Informatics: Concepts, Theory, and Applications, ICAICTA 2019, 1–6. https://doi.org/10.1109/ICAICTA.2019.8904218, 2019.

Ge, Tao, Furu Wei, and Ming Zhou., "Reaching human-level performance in automatic grammatical error correction: An empirical study." arXiv preprint arXiv:1807.01270 (2018).

Y. Heryadi, B. D. Wijanarko, D. F. Murad, C. Tho and K. Hashimoto, "Neural Machine Translation Approach for Low-resource Languages using Long Short-term Memory Model," 2023 International Conference on Computer Science, Information Technology and Engineering (ICCoSITE), Jakarta, Indonesia, 2023, pp. 939-944, doi: 10.1109/ICCoSITE57641.2023.10127724.

Refbacks

  • There are currently no refbacks.