About Me
I am currently part of the Natural Language Processing and Artificial Intelligence (NLP&AI) Laboratory, advised by Prof. Heuiseok Lim.
My research interests include Natural Language Processing, in particular, Information Extraction and Retrieval, Dialogue Systems, and Large Language Models.
News!
- 08/2025 “Semantic Inversion, Identical Replies: Revisiting Negation Blindness in Large Language Models” has been accepted to EMNLP 2024.
- 08/2025 “HAWK: Highlighting Entity-aware Knowledge for Alleviating Information Sparsity in Long Contexts” has been accepted to Findings of EMNLP 2024.
- 08/2025 “LimaCost: Data Valuation for Instruction Tuning of Large Language Models” has been accepted to Findings of EMNLP 2024.
Research Interest
- Natural Language Processing
- Information Extraction and Retrieval
- Dialogue System
- Large Language Model
Education
- Korea University (2022.03 - Present)
    - Ph.D. Candidate (The Integrated Master&Ph.D. Course)
- Computer Science and Engineering Artificial Intelligence Applications
 
- Konkuk University (2018.03 - 2022.02)
    - Bachelor Degree Graduate
- 
        Computer Science and Engineering 
- Clubs and societies
        - Pseudo Lab’s Paper Reading Track (2021)
- AI Lab Korea Open Lab 8th (2021)
- KUSITMS: Korean University Students IT, Management Society (2020)
 
 
Publications
International Conference
- 
    Semantic Inversion, Identical Replies: Revisiting Negation Blindness in Large Language Models The 2025 Conference on Empirical Methods in Natural Language Processing EMNLP 2025 Main Jinsung Kim(*), Seonmin Koo(*), and Heuiseok Lim 
- 
    HAWK: Highlighting Entity-aware Knowledge for Alleviating Information Sparsity in Long Contexts The 2025 Conference on Empirical Methods in Natural Language Processing EMNLP 2025 Findings Seonmin Koo(*), Jinsung Kim(*), Chanjun Park, and Heuiseok Lim 
- 
    LimaCost: Data Valuation for Instruction Tuning of Large Language Models The 2025 Conference on Empirical Methods in Natural Language Processing EMNLP 2025 Findings Hyeonseok Moon, Jaehyung Seo, Seonmin Koo, Jinsung Kim, Young-kyoung Ham, jiwon moon, and Heuiseok Lim 
- 
    Where am I? Large Language Models Wandering between Semantics and Structures in Long Contexts The 2024 Conference on Empirical Methods in Natural Language Processing EMNLP 2024 Main Seonmin Koo(*), Jinsung Kim(*), YoungJoon Jang, Chanjun Park, and Heuiseok Lim 
- 
    PANDA: Persona Attributes Navigation for Detecting and Alleviating Overuse Problem in Large Language Models The 2024 Conference on Empirical Methods in Natural Language Processing EMNLP 2024 Main Jinsung Kim(*), Seonmin Koo(*), and Heuiseok Lim 
- 
    Search if you don’t know! Knowledge-Augmented Korean Grammatical Error Correction with Large Language Models The 2024 Conference on Empirical Methods in Natural Language Processing EMNLP 2024 Findings Seonmin Koo(*), Jinsung Kim(*), Chanjun Park, and Heuiseok Lim 
- 
    Revisiting Under-represented Knowledge of Latin American Literature in Large Language Models The 27th European Conference on Artificial Intelligence ECAI 2024 Main Jinsung Kim(*), Seonmin Koo(*), and Heuiseok Lim 
- 
    KNOTICED: A Dataset for Critical Error Detection in English-Korean Machine Translation Considering Cross-Cultural Factors in English-Korean Translation The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation LREC-COLING 2024 Sugyeong Eo, Jungwoo Lim, Chanjun Park, Dahyun Jung, Seonmin Koo, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim 
- 
    KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing The 2023 Conference on Empirical Methods in Natural Language Processing EMNLP 2023 Main Seonmin Koo(*), Chanjun Park(*), Jinsung Kim, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, and Heuiseok Lim 
- 
    Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline International Conference on Machine Learning (ICML) 2023 – DataPerf (workshop) ICML 2023 Workshop Seonmin Koo(*), Chanjun Park(*), Jinsung Kim, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim 
- 
    Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction International Conference on Machine Learning (ICML) 2023 – DataPerf (workshop) ICML 2023 Workshop Chanjun Park(*), Seonmin Koo(*), Seolhwa Lee(*), Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim 
- 
    A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation The 2022 Conference of the North American Chapter of the Association for Computational Linguistics NAACL 2022 Findings Jaehyung Seo, Seounghoon Lee, Chanjun Park, Yoonna Jang, Hyeonseok Moon, Sugyeong Eo, Seonmin Koo, Heuiseok Lim 
International Journal
- 
    A Large-Scale Dataset for Korean Document-level Relation Extraction from Encyclopedia Texts Applied Intelligence, open access, 2024 Suhyune Son, Jungwoo Lim, Seonmin Koo, Jinsung Kim, Younghoon Kim, Youngsik Lim, Dongseok Hyun, and Heuiseok Lim 
- 
    A Multi-Faceted Exploration Incorporating Question Difficulty in Knowledge Tracing for English Proficiency Assessment Electronics, 12(19), 4171, 2023 Jinsung Kim(*), Seonmin Koo(*), and Heuiseok Lim 
- 
    Doubts on the reliability of parallel corpus filtering Expert Systems with Applications (ESWA), 233, 120962., 2023 Hyeonseok Moon, Chanjun Park, Seonmin Koo, Jungseob Lee, Seungjun Lee, Jaehyung Seo, Sugyeong Eo, Yoonna Jang, Hyunjoong Kim, Hyoung-gyu Lee, Heuiseok Lim 
- 
    Uncovering the Risks and Drawbacks Associated with the Use of Synthetic Data for Grammatical Error Correction IEEE Access, 2023 Seonmin Koo, Chanjun Park, Seolhwa Lee, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim 
- 
    A Survey on Evaluation Metrics for Machine Translation Mathematics, 2023 Seungjun Lee, Jungseob Lee, Hyeonseok Moon, Chanjun Park, Jaehyung Seo, Sugyeong Eo, Seonmin Koo, Heuiseok Lim 
- 
    K-NCT: Korean Neural Grammatical Error Correction Gold-Standard Test Set Using Novel Error Type Classification Criteria IEEE Acess, 2022 Seonmin Koo(*), Chanjun Park(*), Jaehyung Seo, Seungjun Lee, Hyeonseok Moon, Jungseob Lee, Heuiseok Lim 
Domestic Conference
- 
    A Study on the Persona Updating Capabilities of Large Language Models in Dialogue (대화 상황에서 거대 언어 모델의 페르소나 갱신 능력 검증 연구) The 2025 Joint Conference on Human and Cognitive Language Technology, Korean Association for Corpus Linguistics (HCLT 2025, 한글 및 한국어정보처리 & 한국코퍼스언어학회 공동 학술대회) Seonmin Koo, Jinsung Kim, Kinam Park, and Heuiseok Lim 
- 
    Data Augmentation for Negotiation Dialogues in E-Commerce via Scenario Transfer (시나리오 전이를 활용한 전자상거래 환경에서의 협상 대화 데이터 증강) The 2025 Joint Conference on Human and Cognitive Language Technology, Korean Association for Corpus Linguistics (HCLT 2025, 한글 및 한국어정보처리 & 한국코퍼스언어학회 공동 학술대회) Jinsung Kim, Seonmin Koo, Kinam Park, and Heuiseok Lim 
- 
    User-Centered Evaluation of LLMs in Emotional Support Conversations (정서적 지지 대화에서 대규모 언어모델의 사용자 경험 중심 평가 프레임워크) The 2025 Joint Conference on Human and Cognitive Language Technology, Korean Association for Corpus Linguistics (HCLT 2025, 한글 및 한국어정보처리 & 한국코퍼스언어학회 공동 학술대회) Suhyune Son, Seonmin Koo, Hayoon Zi, JeongBae Park, Heuiseok Lim 
- 
    Examining the Ability of Large Language Model on Entity-Based Korean Question-Answering (엔티티 기반 추론을 통한 거대 언어 모델의 한국어 질의응답 능력 연구) The 36th Annual Conference on Human & Cognitive Language Technology (HCLT 2024, 한글 및 한국어정보처리 학술대회) Seonmin Koo, Jinsung Kim, Chanjun Park, Kinam Park, and Heuiseok Lim 
- 
    Exploring Korean Question Answering Ability of Large Language Model by Question Corruption (질의 변형에 따른 거대 언어모델의 한국어 질의응답 능력 연구) The 36th Annual Conference on Human & Cognitive Language Technology (HCLT 2024, 한글 및 한국어정보처리 학술대회) Jinsung Kim, Seonmin Koo, Kinam Park, and Heuiseok Lim 
- 
    Examining the Feasibility of Utilizing a Large Language Model for Korean Grammatical Error Correction (한국어 맞춤법 교정을 위한 초거대 언어 모델의 잠재적 능력 탐색) The 35th Annual Conference on Human & Cognitive Language Technology (HCLT 2023, 한글 및 한국어정보처리 학술대회) Seonmin Koo, Chanjun Park, JeongBae Park, and Heuiseok Lim 
- 
    Automatic Generation of Training Data for Korean Speech Recognition Post-Processor (한국어 음성인식 후처리기를 위한 학습 데이터 자동 생성 방안) The 34th Annual Conference on Human & Cognitive Language Technology (HCLT 2022, 한글 및 한국어정보처리 학술대회) Seonmin Koo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Sugyeong Eo, Yuna Hur, and Heuiseok Lim 
- 
    KoCED: English-Korean Critical Error Detection Dataset (KoCED: 윤리 및 사회적 문제를 초래하는 기계번역 오류 탐지를 위한 학습 데이터셋) The 34th Annual Conference on Human & Cognitive Language Technology (HCLT 2022, 한글 및 한국어정보처리 학술대회) Sugyeong Eo, Suwon Choi, Seonmin Koo, Dahyun Jung, Chanjun Park, Jaehyung Seo, Hyeonseok Moon, Jeongbae Park, and Heuiseok Lim 
- 
    Error Type Categorization for deep learning-based Korean Spelling Correction (딥러닝 기반 한국어 맞춤법 교정 연구를 위한 오류 유형 분류) Korea Software Congress 2021 (KSC2021, 한국소프트웨어종합학술대회) Seonmin Koo, Chanjun Park, and Heuiseok Lim 
Domestic Journal
- 
    Classification and analysis of error types for deep learning-based Korean spelling correction (딥러닝 기반 한국어 맞춤법 교정을 위한 오류 유형 분류 및 분석) Journal of the Korea Convergence Society, 12(12), 65-74. Seonmin Koo, Chanjun Park, Aram So, and Heuiseok Lim 
Projects
- Hyundai Motor Company (2024-2025)
    - Development of TableQA model and framework
- In collaboration with the Infotainment department of Hyundai Motor Company
 
- KT AI/LLM Benchmark (2024-2025)
    - Building the LLM benchmark datasets in Korean language
- Appropriate benchmark dataset evaluation method and performance evaluation
- In collaboration with the KT
 
- Miri Canvans Company(2024-2024)
    - Development of a model for extracting compatible elements
- Model that identifies the most compatible images (elements) with surrounding design elements based on the elements returned by the input text
- In collaboration with Miri Canvas
 
- CYD ASR Post-Processing (2022-2024)
    - Generating post-processor data through an automatic parallel corpus generation methodology and a noisy method
- Developing a AST post-processing that improves performance by correcting speech recognition results
 
- NC Soft Persona Dialogue System (2023-2023)
    - Persona-grounded dialogue generation framework development
- Personal information and dialogue history datastore construction w/ kNN method
- kNN datastore + PLM inference (e.g., DialoGPT, GODEL)
- Persona data augmentation w/ knowledge base and LLM
- In collaboration with Language Understanding team, Dialogue team, and LLM team of NC Soft
 
- KIGAM Mineral Deposit Prediction Project (2023-2023)
    - Development of Ni, Co, Li mineral deposit prediction model using AI
- In collaboration with Korea Institute of Geoscience and Mineral Resources (KIGAM)
 
- Naver Corporation (2022-2023)
    - Korean document-based relation extraction (RE) framework development
- Semi-automated RE data construction method design
- Model training w/ Naver encyclopedia
- In collaboration with Naver Encyclopedia team
 
- Naver Papago (2021-2022)
    - Development of high-performance parallel corpus filtering technology
- A parallel corpus filtering methodology that automatically selects and removes data that is not suitable for training data
- In collaboration with Naver Papago team
 
Awards and Honors
- Received Korea University Best Paper Award 2023
A Little More About Me
- Certificates
    - National Science and Technology Big Data Analysis Article (2021)
- SQLD: Structured Query Language Developer (2020)
- ADsP : Advanced Data Analytics Semi-Professional (2019)
 
- Hobbies
    - Must-go Restaurant Tour
- Taekwondo