KNOTICED: Augmentative and Alternative Communication Software for Language Developmental Disabilities (LREC-COLING 2024)

Authors

  • Sugyeong Eo, Jungwoo Lim, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim

Abstract

Critical error detection (CED) is a task that identifies an inherent risk of catastrophic meaning distortions in the machine translation (MT) output. In this paper, we propose KNOTICED, a critical error detection dataset for English-Korean MT. With the importance of reflecting cultural elements in detecting critical errors, KNOTICED newly introduces the culture-aware “Politeness” type. Besides, we facilitate two tasks by additionally annotating multiclass labels: critical error detection and critical error type classification (CETC). Empirical evaluations reveal that our introduced data augmentation approach using a newly presented perturber significantly outperforms existing baselines in both tasks. Further analysis highlights the significance of multiclass labeling by demonstrating its superior effectiveness compared to binary labels. Our dataset and code are available at: (Due to the anonymity issue, we plan to release our dataset and code publicly accessible upon acceptance).

Check out the This Link for more info on our paper