Length-aware Byte Pair Encoding for Mitigating Over-segmentation in Korean Machine Translation (ACL-findings 2024)
KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models (ACL-findings 2024)
Leveraging Pre-existing Resources for Data-Efficient Counter-Narrative Generation in Korean (LREC-COLING 2024)
KNOTICED: Augmentative and Alternative Communication Software for Language Developmental Disabilities (LREC-COLING 2024)
Hyper-BTS Dataset: Scalability and Enhanced Analysis of Back TranScription (BTS) for ASR Post-Processing (EACL-findings 2024)
Generative Interpretation: Toward Human-Like Evaluation for Educational Question-Answer Pair Generation (EACL-findings 2024)
CReTIHC: Designing Causal Reasoning Tasks about Temporal Interventions and Hallucinated Confoundings (EMNLP-findings 2023)
CHEF in the Language Kitchen: A Generative Data Augmentation Leveraging Korean Morpheme Ingredients (EMNLP 2023)
Informative Evidence-guided Prompt-based Fine-tuning for English-Korean Critical Error Detection (IJCNLP-AACL 2023)
Uncovering the Risks and Drawbacks Associated with the Use of Synthetic Data for Grammatical Error Correction (IEEE-ACCESS 2023)
Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction (ICML-DMLR workshop 2023)
PU-GEN: Enhancing generative commonsense reasoning for language models with human-centered knowledge (Knowledge-Based Systems 2022)
PicTalky: Augmentative and Alternative Communication Software for Language Developmental Disabilities (AACL 2022)
The ASR Post-Processor Performance Challenges of BackTranScription (BTS): Data-Centric and Model-Centric Approaches (Mathematics 2022)
Focus on FoCus: Is FoCus focused on Context, Knowledge and Persona? (Customized Chat Grounding Persona and Knowledge Workshop at COLING 2022)
QUAK: A Synthetic Quality Estimation Dataset for Korean-English Neural Machine Translation (COLING 2022)
A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation (Findings of NAACL 2022)
BERTOEIC: Solving TOEIC Problems Using Simple and Efficient Data Augmentation Techniques with Pretrained Transformer Encoders (Applied Sciences 2022)
Empirical Analysis of Noising Scheme based Synthetic Data Generation for Automatic Post-editing (LREC 2022)
Dense-to-Question and Sparse-to-Answer: Hybrid Retriever System for Industrial Frequently Asked Questions (Mathematics 2022)
A New Tool for Efficiently Generating Quality Estimation Datasets (Data-centric AI Workshop at NeurlPS 2021)
Automatic Knowledge Augmentation for Generative Commonsense Reasoning (Data-centric AI Workshop at NeurlPS 2021)
Grounded Vocabulary for Image Retrieval Using a Modified Multi-Generator Generative Adversarial Network (IEEE Access 2021)
[Best Paper Awards] KommonGen: A Dataset for Korean Generative Commonsense Reasoning Evaluation (HCLT 2021)
[Best Paper Awards] Categorization and Analysis of Error Types in the Korean Speech Recognition System (HCLT 2021)
Comparative analysis of current approaches to quality estimation for neural machine translation (Applied Sciences 2021)