About Me
Seonmin Koo received a B.S. degree from the Department of Computer Science and Engineering, Konkuk University, Seoul, South Korea, in 2022, where she is currently pursuing a Ph.D. degree in computer science and engineering at Korea University, Seoul, South Korea.
In 2021, she worked as a research student at ETRI’s Language Intelligence Laboratory. She is currently part of the Natural Language Processing and Artificial Intelligence (NLP&AI) Laboratory, advised by Prof. Heuiseok Lim. Her research interests include natural language processing, in particular, post-processing, information extraction and retrieval, dialogue systems, and large language models.
News!
- 09/2024 “Where am I? Large Language Models Wandering between Semantics and Structures in Long Contexts” has been accepted to EMNLP 2024.
- 09/2024 “PANDA: Persona Attributes Navigation for Detecting and Alleviating Overuse Problem in Large Language Models” has been accepted to EMNLP 2024.
- 09/2024 “Search if you don’t know! Knowledge-Augmented Korean Grammatical Error Correction with Large Language Models” has been accepted to Findings of EMNLP 2024.
Research Interest
- Natural Language Processing
- Post-processing
- Information Extraction and Retrieval
- Dialogue System
- Large Language Model
Education
- Korea University (2022.03 - Present)
- Ph.D. Candidate (The Integrated Master&Ph.D. Course)
- Computer Science and Engineering Artificial Intelligence Applications
- Konkuk University (2018.03 - 2022.02)
- Bachelor Degree Graduate
-
Computer Science and Engineering
- Clubs and societies
- Pseudo Lab’s Paper Reading Track (2021)
- AI Lab Korea Open Lab 8th (2021)
- KUSITMS: Korean University Students IT, Management Society (2020)
Publications
International Conference
-
Where am I? Large Language Models Wandering between Semantics and Structures in Long Contexts
The 2024 Conference on Empirical Methods in Natural Language Processing EMNLP 2024 Main
Seonmin Koo(*), Jinsung Kim(*), YoungJoon Jang, Chanjun Park, and Heuiseok Lim
-
PANDA: Persona Attributes Navigation for Detecting and Alleviating Overuse Problem in Large Language Models
The 2024 Conference on Empirical Methods in Natural Language Processing EMNLP 2024 Main
Jinsung Kim(*), Seonmin Koo(*), and Heuiseok Lim
-
Search if you don’t know! Knowledge-Augmented Korean Grammatical Error Correction with Large Language Models
The 2024 Conference on Empirical Methods in Natural Language Processing EMNLP 2024 Findings
Seonmin Koo(*), Jinsung Kim(*), Chanjun Park, and Heuiseok Lim
-
Revisiting Under-represented Knowledge of Latin American Literature in Large Language Models
The 27th European Conference on Artificial Intelligence ECAI 2024 Main
Jinsung Kim(*), Seonmin Koo(*), and Heuiseok Lim
-
KNOTICED: A Dataset for Critical Error Detection in English-Korean Machine Translation Considering Cross-Cultural Factors in English-Korean Translation
The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation LREC-COLING 2024
Sugyeong Eo, Jungwoo Lim, Chanjun Park, Dahyun Jung, Seonmin Koo, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
-
KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing
The 2023 Conference on Empirical Methods in Natural Language Processing EMNLP 2023 Main
Seonmin Koo(*), Chanjun Park(*), Jinsung Kim, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, and Heuiseok Lim
-
Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline
International Conference on Machine Learning (ICML) 2023 – DataPerf (workshop) ICML 2023 Workshop
Seonmin Koo(*), Chanjun Park(*), Jinsung Kim, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
-
Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction
International Conference on Machine Learning (ICML) 2023 – DataPerf (workshop) ICML 2023 Workshop
Chanjun Park(*), Seonmin Koo(*), Seolhwa Lee(*), Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
-
A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation
The 2022 Conference of the North American Chapter of the Association for Computational Linguistics NAACL 2022 Findings
Jaehyung Seo, Seounghoon Lee, Chanjun Park, Yoonna Jang, Hyeonseok Moon, Sugyeong Eo, Seonmin Koo, Heuiseok Lim
International Journal
-
A Large-Scale Dataset for Korean Document-level Relation Extraction from Encyclopedia Texts
Applied Intelligence, open access, 2024
Suhyune Son, Jungwoo Lim, Seonmin Koo, Jinsung Kim, Younghoon Kim, Youngsik Lim, Dongseok Hyun, and Heuiseok Lim
-
A Multi-Faceted Exploration Incorporating Question Difficulty in Knowledge Tracing for English Proficiency Assessment
Electronics, 12(19), 4171, 2023
Jinsung Kim(*), Seonmin Koo(*), and Heuiseok Lim
-
Doubts on the reliability of parallel corpus filtering
Expert Systems with Applications (ESWA), 233, 120962., 2023
Hyeonseok Moon, Chanjun Park, Seonmin Koo, Jungseob Lee, Seungjun Lee, Jaehyung Seo, Sugyeong Eo, Yoonna Jang, Hyunjoong Kim, Hyoung-gyu Lee, Heuiseok Lim
-
Uncovering the Risks and Drawbacks Associated with the Use of Synthetic Data for Grammatical Error Correction
IEEE Access, 2023
Seonmin Koo, Chanjun Park, Seolhwa Lee, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
-
A Survey on Evaluation Metrics for Machine Translation
Mathematics, 2023
Seungjun Lee, Jungseob Lee, Hyeonseok Moon, Chanjun Park, Jaehyung Seo, Sugyeong Eo, Seonmin Koo, Heuiseok Lim
-
K-NCT: Korean Neural Grammatical Error Correction Gold-Standard Test Set Using Novel Error Type Classification Criteria
IEEE Acess, 2022
Seonmin Koo(*), Chanjun Park(*), Jaehyung Seo, Seungjun Lee, Hyeonseok Moon, Jungseob Lee, Heuiseok Lim
Domestic Conference
-
Examining the Ability of Large Language Model on Entity-Based Korean Question-Answering (엔티티 기반 추론을 통한 거대 언어 모델의 한국어 질의응답 능력 연구)
The 36th Annual Conference on Human & Cognitive Language Technology (HCLT 2024, 한글 및 한국어정보처리 학술대회)
Seonmin Koo, Jinsung Kim, Chanjun Park, Kinam Park, and Heuiseok Lim
-
Exploring Korean Question Answering Ability of Large Language Model by Question Corruption (질의 변형에 따른 거대 언어모델의 한국어 질의응답 능력 연구)
The 36th Annual Conference on Human & Cognitive Language Technology (HCLT 2024, 한글 및 한국어정보처리 학술대회)
Jinsung Kim, Seonmin Koo, Kinam Park, and Heuiseok Lim
-
Examining the Feasibility of Utilizing a Large Language Model for Korean Grammatical Error Correction (한국어 맞춤법 교정을 위한 초거대 언어 모델의 잠재적 능력 탐색)
The 35th Annual Conference on Human & Cognitive Language Technology (HCLT 2023, 한글 및 한국어정보처리 학술대회)
Seonmin Koo, Chanjun Park, JeongBae Park, and Heuiseok Lim
-
Automatic Generation of Training Data for Korean Speech Recognition Post-Processor (한국어 음성인식 후처리기를 위한 학습 데이터 자동 생성 방안)
The 34th Annual Conference on Human & Cognitive Language Technology (HCLT 2022, 한글 및 한국어정보처리 학술대회)
Seonmin Koo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Sugyeong Eo, Yuna Hur, and Heuiseok Lim
-
KoCED: English-Korean Critical Error Detection Dataset (KoCED: 윤리 및 사회적 문제를 초래하는 기계번역 오류 탐지를 위한 학습 데이터셋)
The 34th Annual Conference on Human & Cognitive Language Technology (HCLT 2022, 한글 및 한국어정보처리 학술대회)
Sugyeong Eo, Suwon Choi, Seonmin Koo, Dahyun Jung, Chanjun Park, Jaehyung Seo, Hyeonseok Moon, Jeongbae Park, and Heuiseok Lim
-
Error Type Categorization for deep learning-based Korean Spelling Correction (딥러닝 기반 한국어 맞춤법 교정 연구를 위한 오류 유형 분류)
Korea Software Congress 2021 (KSC2021, 한국소프트웨어종합학술대회)
Seonmin Koo, Chanjun Park, and Heuiseok Lim
Domestic Journal
-
Classification and analysis of error types for deep learning-based Korean spelling correction (딥러닝 기반 한국어 맞춤법 교정을 위한 오류 유형 분류 및 분석)
Journal of the Korea Convergence Society, 12(12), 65-74.
Seonmin Koo, Chanjun Park, Aram So, and Heuiseok Lim
Projects
- Hyundai Motor Company (2024-Present)
- Development of TableQA model and framework
- In collaboration with the Infotainment department of Hyundai Motor Company
- KT AI/LLM Benchmark (2024-Present)
- Building the LLM benchmark datasets in Korean language
- Appropriate benchmark dataset evaluation method and performance evaluation
- In collaboration with the KT
- Miri Canvans Company(2024-2024)
- Development of a model for extracting compatible elements
- Model that identifies the most compatible images (elements) with surrounding design elements based on the elements returned by the input text
- In collaboration with Miri Canvas
- CYD ASR Post-Processing (2022-2024)
- Generating post-processor data through an automatic parallel corpus generation methodology and a noisy method
- Developing a AST post-processing that improves performance by correcting speech recognition results
- NC Soft Persona Dialogue System (2023-2023)
- Persona-grounded dialogue generation framework development
- Personal information and dialogue history datastore construction w/ kNN method
- kNN datastore + PLM inference (e.g., DialoGPT, GODEL)
- Persona data augmentation w/ knowledge base and LLM
- In collaboration with Language Understanding team, Dialogue team, and LLM team of NC Soft
- KIGAM Mineral Deposit Prediction Project (2023-2023)
- Development of Ni, Co, Li mineral deposit prediction model using AI
- In collaboration with Korea Institute of Geoscience and Mineral Resources (KIGAM)
- Naver Corporation (2022-2023)
- Korean document-based relation extraction (RE) framework development
- Semi-automated RE data construction method design
- Model training w/ Naver encyclopedia
- In collaboration with Naver Encyclopedia team
- Naver Papago (2021-2022)
- Development of high-performance parallel corpus filtering technology
- A parallel corpus filtering methodology that automatically selects and removes data that is not suitable for training data
- In collaboration with Naver Papago team
Awards and Honors
- Received Korea University Best Paper Award 2023
A Little More About Me
- Certificates
- National Science and Technology Big Data Analysis Article (2021)
- SQLD: Structured Query Language Developer (2020)
- ADsP : Advanced Data Analytics Semi-Professional (2019)
- Hobbies
- Must-go Restaurant Tour
- Taekwondo