NLP & AI Lab., Korea University

About Me

I am currently part of the Natural Language Processing and Artificial Intelligence (NLP&AI) Laboratory, advised by Prof. Heuiseok Lim.

My research interests include Natural Language Processing, in particular, Information Extraction and Retrieval, Dialogue Systems, and Large Language Models.

News!

01/2026 “I Know, but I Don’t Know! How Persona Conflict Undermines Instruction Adherence in Large Language Models” has been accepted to Findings of EACL 2026.
01/2026 “Evaluating Over-Empathizing in Emotional Support Conversations: A User-Centered Framework” has been accepted to ESWA Journal.

Research Interest

Natural Language Processing
Information Extraction and Retrieval
Dialogue System
Large Language Model

Education

Korea University (2022.03 - Present)
- Ph.D. Candidate (The Integrated Master&Ph.D. Course)
- Computer Science and Engineering Artificial Intelligence Applications
Konkuk University (2018.03 - 2022.02)
- Bachelor Degree Graduate
- Computer Science and Engineering
- Clubs and societies
  - Pseudo Lab’s Paper Reading Track (2021)
  - AI Lab Korea Open Lab 8th (2021)
  - KUSITMS: Korean University Students IT, Management Society (2020)

Publications

International Conference

I Know, but I Don’t Know! How Persona Conflict Undermines Instruction Adherence in Large Language Models

The 19th Conference of the European Chapter of the Association for Computational Linguistics EACL 2026 Findings

Seonmin Koo(*), Jinsung Kim(*), and Heuiseok Lim
Semantic Inversion, Identical Replies: Revisiting Negation Blindness in Large Language Models

The 2025 Conference on Empirical Methods in Natural Language Processing EMNLP 2025 Main

Jinsung Kim(*), Seonmin Koo(*), and Heuiseok Lim
HAWK: Highlighting Entity-aware Knowledge for Alleviating Information Sparsity in Long Contexts

The 2025 Conference on Empirical Methods in Natural Language Processing EMNLP 2025 Findings

Seonmin Koo(*), Jinsung Kim(*), Chanjun Park, and Heuiseok Lim
LimaCost: Data Valuation for Instruction Tuning of Large Language Models

The 2025 Conference on Empirical Methods in Natural Language Processing EMNLP 2025 Findings

Hyeonseok Moon, Jaehyung Seo, Seonmin Koo, Jinsung Kim, Young-kyoung Ham, jiwon moon, and Heuiseok Lim
Where am I? Large Language Models Wandering between Semantics and Structures in Long Contexts

The 2024 Conference on Empirical Methods in Natural Language Processing EMNLP 2024 Main

Seonmin Koo(*), Jinsung Kim(*), YoungJoon Jang, Chanjun Park, and Heuiseok Lim
PANDA: Persona Attributes Navigation for Detecting and Alleviating Overuse Problem in Large Language Models

The 2024 Conference on Empirical Methods in Natural Language Processing EMNLP 2024 Main

Jinsung Kim(*), Seonmin Koo(*), and Heuiseok Lim
Search if you don’t know! Knowledge-Augmented Korean Grammatical Error Correction with Large Language Models

The 2024 Conference on Empirical Methods in Natural Language Processing EMNLP 2024 Findings

Seonmin Koo(*), Jinsung Kim(*), Chanjun Park, and Heuiseok Lim
Revisiting Under-represented Knowledge of Latin American Literature in Large Language Models

The 27th European Conference on Artificial Intelligence ECAI 2024 Main

Jinsung Kim(*), Seonmin Koo(*), and Heuiseok Lim
Detecting Critical Errors Considering Cross-Cultural Factors in English-Korean Translation

The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation LREC-COLING 2024

Sugyeong Eo, Jungwoo Lim, Chanjun Park, Dahyun Jung, Seonmin Koo, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing

The 2023 Conference on Empirical Methods in Natural Language Processing EMNLP 2023 Main

Seonmin Koo(*), Chanjun Park(*), Jinsung Kim, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, and Heuiseok Lim
Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline

International Conference on Machine Learning (ICML) 2023 – DataPerf (workshop) ICML 2023 Workshop

Seonmin Koo(*), Chanjun Park(*), Jinsung Kim, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction

International Conference on Machine Learning (ICML) 2023 – DataPerf (workshop) ICML 2023 Workshop

Chanjun Park(*), Seonmin Koo(*), Seolhwa Lee(*), Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation

The 2022 Conference of the North American Chapter of the Association for Computational Linguistics NAACL 2022 Findings

Jaehyung Seo, Seounghoon Lee, Chanjun Park, Yoonna Jang, Hyeonseok Moon, Sugyeong Eo, Seonmin Koo, Heuiseok Lim

International Journal

Evaluating Over-Empathizing in Emotional Support Conversations: A User-Centered Framework

Expert Systems with Applications (ESWA), 131059, 2026

Suhyune Son, Seonmin Koo, Evelyn H. Zi, Jungsun Jang, and Heuiseok Lim
A Large-Scale Dataset for Korean Document-level Relation Extraction from Encyclopedia Texts

Applied Intelligence, 54(17), 8681-8701., 2024

Suhyune Son, Jungwoo Lim, Seonmin Koo, Jinsung Kim, Younghoon Kim, Youngsik Lim, Dongseok Hyun, and Heuiseok Lim
A Multi-Faceted Exploration Incorporating Question Difficulty in Knowledge Tracing for English Proficiency Assessment

Electronics, 12(19), 4171, 2023

Jinsung Kim(*), Seonmin Koo(*), and Heuiseok Lim
Doubts on the reliability of parallel corpus filtering

Expert Systems with Applications (ESWA), 233, 120962., 2023

Hyeonseok Moon, Chanjun Park, Seonmin Koo, Jungseob Lee, Seungjun Lee, Jaehyung Seo, Sugyeong Eo, Yoonna Jang, Hyunjoong Kim, Hyoung-gyu Lee, Heuiseok Lim
Uncovering the Risks and Drawbacks Associated with the Use of Synthetic Data for Grammatical Error Correction

IEEE Access, 2023

Seonmin Koo, Chanjun Park, Seolhwa Lee, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
A Survey on Evaluation Metrics for Machine Translation

Mathematics, 2023

Seungjun Lee, Jungseob Lee, Hyeonseok Moon, Chanjun Park, Jaehyung Seo, Sugyeong Eo, Seonmin Koo, Heuiseok Lim
K-NCT: Korean Neural Grammatical Error Correction Gold-Standard Test Set Using Novel Error Type Classification Criteria

IEEE Acess, 2022

Seonmin Koo(*), Chanjun Park(*), Jaehyung Seo, Seungjun Lee, Hyeonseok Moon, Jungseob Lee, Heuiseok Lim

Domestic Conference

A Study on the Persona Updating Capabilities of Large Language Models in Dialogue (대화 상황에서 거대 언어 모델의 페르소나 갱신 능력 검증 연구)

The 2025 Joint Conference on Human and Cognitive Language Technology, Korean Association for Corpus Linguistics (HCLT 2025, 한글 및 한국어정보처리 & 한국코퍼스언어학회 공동 학술대회)

Seonmin Koo, Jinsung Kim, Kinam Park, and Heuiseok Lim
Data Augmentation for Negotiation Dialogues in E-Commerce via Scenario Transfer (시나리오 전이를 활용한 전자상거래 환경에서의 협상 대화 데이터 증강)

The 2025 Joint Conference on Human and Cognitive Language Technology, Korean Association for Corpus Linguistics (HCLT 2025, 한글 및 한국어정보처리 & 한국코퍼스언어학회 공동 학술대회)

Jinsung Kim, Seonmin Koo, Kinam Park, and Heuiseok Lim
User-Centered Evaluation of LLMs in Emotional Support Conversations (정서적 지지 대화에서 대규모 언어모델의 사용자 경험 중심 평가 프레임워크)

The 2025 Joint Conference on Human and Cognitive Language Technology, Korean Association for Corpus Linguistics (HCLT 2025, 한글 및 한국어정보처리 & 한국코퍼스언어학회 공동 학술대회)

Suhyune Son, Seonmin Koo, Hayoon Zi, JeongBae Park, Heuiseok Lim
Examining the Ability of Large Language Model on Entity-Based Korean Question-Answering (엔티티 기반 추론을 통한 거대 언어 모델의 한국어 질의응답 능력 연구)

The 36th Annual Conference on Human & Cognitive Language Technology (HCLT 2024, 한글 및 한국어정보처리 학술대회)

Seonmin Koo, Jinsung Kim, Chanjun Park, Kinam Park, and Heuiseok Lim
Exploring Korean Question Answering Ability of Large Language Model by Question Corruption (질의 변형에 따른 거대 언어모델의 한국어 질의응답 능력 연구)

The 36th Annual Conference on Human & Cognitive Language Technology (HCLT 2024, 한글 및 한국어정보처리 학술대회)

Jinsung Kim, Seonmin Koo, Kinam Park, and Heuiseok Lim
Examining the Feasibility of Utilizing a Large Language Model for Korean Grammatical Error Correction (한국어 맞춤법 교정을 위한 초거대 언어 모델의 잠재적 능력 탐색)

The 35th Annual Conference on Human & Cognitive Language Technology (HCLT 2023, 한글 및 한국어정보처리 학술대회)

Seonmin Koo, Chanjun Park, JeongBae Park, and Heuiseok Lim
Automatic Generation of Training Data for Korean Speech Recognition Post-Processor (한국어 음성인식 후처리기를 위한 학습 데이터 자동 생성 방안)

The 34th Annual Conference on Human & Cognitive Language Technology (HCLT 2022, 한글 및 한국어정보처리 학술대회)

Seonmin Koo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Sugyeong Eo, Yuna Hur, and Heuiseok Lim
KoCED: English-Korean Critical Error Detection Dataset (KoCED: 윤리 및 사회적 문제를 초래하는 기계번역 오류 탐지를 위한 학습 데이터셋)

The 34th Annual Conference on Human & Cognitive Language Technology (HCLT 2022, 한글 및 한국어정보처리 학술대회)

Sugyeong Eo, Suwon Choi, Seonmin Koo, Dahyun Jung, Chanjun Park, Jaehyung Seo, Hyeonseok Moon, Jeongbae Park, and Heuiseok Lim
Error Type Categorization for deep learning-based Korean Spelling Correction (딥러닝 기반 한국어 맞춤법 교정 연구를 위한 오류 유형 분류)

Korea Software Congress 2021 (KSC2021, 한국소프트웨어종합학술대회)

Seonmin Koo, Chanjun Park, and Heuiseok Lim

Domestic Journal

Classification and analysis of error types for deep learning-based Korean spelling correction (딥러닝 기반 한국어 맞춤법 교정을 위한 오류 유형 분류 및 분석)

Journal of the Korea Convergence Society, 12(12), 65-74.

Seonmin Koo, Chanjun Park, Aram So, and Heuiseok Lim

Projects

Hyundai Motor Company (2024-2025)
- Development of TableQA model and framework
- In collaboration with the Infotainment department of Hyundai Motor Company
KT AI/LLM Benchmark (2024-2025)
- Building the LLM benchmark datasets in Korean language
- Appropriate benchmark dataset evaluation method and performance evaluation
- In collaboration with the KT
Miri Canvans Company(2024-2024)
- Development of a model for extracting compatible elements
- Model that identifies the most compatible images (elements) with surrounding design elements based on the elements returned by the input text
- In collaboration with Miri Canvas
CYD ASR Post-Processing (2022-2024)
- Generating post-processor data through an automatic parallel corpus generation methodology and a noisy method
- Developing a AST post-processing that improves performance by correcting speech recognition results
NC Soft Persona Dialogue System (2023-2023)
- Persona-grounded dialogue generation framework development
- Personal information and dialogue history datastore construction w/ kNN method
- kNN datastore + PLM inference (e.g., DialoGPT, GODEL)
- Persona data augmentation w/ knowledge base and LLM
- In collaboration with Language Understanding team, Dialogue team, and LLM team of NC Soft
KIGAM Mineral Deposit Prediction Project (2023-2023)
- Development of Ni, Co, Li mineral deposit prediction model using AI
- In collaboration with Korea Institute of Geoscience and Mineral Resources (KIGAM)
Naver Corporation (2022-2023)
- Korean document-based relation extraction (RE) framework development
- Semi-automated RE data construction method design
- Model training w/ Naver encyclopedia
- In collaboration with Naver Encyclopedia team
Naver Papago (2021-2022)
- Development of high-performance parallel corpus filtering technology
- A parallel corpus filtering methodology that automatically selects and removes data that is not suitable for training data
- In collaboration with Naver Papago team

Awards and Honors

Received Korea University Best Paper Award 2023

A Little More About Me

Certificates
- National Science and Technology Big Data Analysis Article (2021)
- SQLD: Structured Query Language Developer (2020)
- ADsP : Advanced Data Analytics Semi-Professional (2019)
Hobbies
- Must-go Restaurant Tour
- Taekwondo

Seonmin Koo

NLP & AI Lab., Korea University

About Me

News!

Research Interest

Education

Publications

International Conference

International Journal

Domestic Conference

Domestic Journal

Projects

Awards and Honors

A Little More About Me