Seonmin Koo

NLP & AI Lab., Korea University

About Me

Seonmin Koo received a B.S. degree from the Department of Computer Science and Engineering, Konkuk University, Seoul, South Korea, in 2022, where she is currently pursuing a Ph.D. degree in computer science and engineering at Korea University, Seoul, South Korea.

In 2021, she worked as a research student at ETRI’s Language Intelligence Laboratory. She is currently part of the Natural Language Processing and Artificial Intelligence (NLP&AI) Laboratory, advised by Prof. Heuiseok Lim. Her research interests include natural language processing, in particular, post-processing, information extraction and retrieval, dialogue systems, and large language models.

News!

  • 09/2024 “Where am I? Large Language Models Wandering between Semantics and Structures in Long Contexts” has been accepted to EMNLP 2024.
  • 09/2024 “PANDA: Persona Attributes Navigation for Detecting and Alleviating Overuse Problem in Large Language Models” has been accepted to EMNLP 2024.
  • 09/2024 “Search if you don’t know! Knowledge-Augmented Korean Grammatical Error Correction with Large Language Models” has been accepted to Findings of EMNLP 2024.

Research Interest

  • Natural Language Processing
  • Post-processing
  • Information Extraction and Retrieval
  • Dialogue System
  • Large Language Model

Education

  • Korea University (2022.03 - Present)
    • Ph.D. Candidate (The Integrated Master&Ph.D. Course)
    • Computer Science and Engineering Artificial Intelligence Applications
  • Konkuk University (2018.03 - 2022.02)
    • Bachelor Degree Graduate
    • Computer Science and Engineering

    • Clubs and societies
      • Pseudo Lab’s Paper Reading Track (2021)
      • AI Lab Korea Open Lab 8th (2021)
      • KUSITMS: Korean University Students IT, Management Society (2020)

Publications

International Conference

  • Where am I? Large Language Models Wandering between Semantics and Structures in Long Contexts

    The 2024 Conference on Empirical Methods in Natural Language Processing EMNLP 2024 Main

    Seonmin Koo(*), Jinsung Kim(*), YoungJoon Jang, Chanjun Park, and Heuiseok Lim

  • PANDA: Persona Attributes Navigation for Detecting and Alleviating Overuse Problem in Large Language Models

    The 2024 Conference on Empirical Methods in Natural Language Processing EMNLP 2024 Main

    Jinsung Kim(*), Seonmin Koo(*), and Heuiseok Lim

  • Search if you don’t know! Knowledge-Augmented Korean Grammatical Error Correction with Large Language Models

    The 2024 Conference on Empirical Methods in Natural Language Processing EMNLP 2024 Findings

    Seonmin Koo(*), Jinsung Kim(*), Chanjun Park, and Heuiseok Lim

  • Revisiting Under-represented Knowledge of Latin American Literature in Large Language Models

    The 27th European Conference on Artificial Intelligence ECAI 2024 Main

    Jinsung Kim(*), Seonmin Koo(*), and Heuiseok Lim

  • KNOTICED: A Dataset for Critical Error Detection in English-Korean Machine Translation Considering Cross-Cultural Factors in English-Korean Translation

    The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation LREC-COLING 2024

    Sugyeong Eo, Jungwoo Lim, Chanjun Park, Dahyun Jung, Seonmin Koo, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim

  • KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing

    The 2023 Conference on Empirical Methods in Natural Language Processing EMNLP 2023 Main

    Seonmin Koo(*), Chanjun Park(*), Jinsung Kim, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, and Heuiseok Lim

  • Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline

    International Conference on Machine Learning (ICML) 2023 – DataPerf (workshop) ICML 2023 Workshop

    Seonmin Koo(*), Chanjun Park(*), Jinsung Kim, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim

  • Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction

    International Conference on Machine Learning (ICML) 2023 – DataPerf (workshop) ICML 2023 Workshop

    Chanjun Park(*), Seonmin Koo(*), Seolhwa Lee(*), Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim

  • A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation

    The 2022 Conference of the North American Chapter of the Association for Computational Linguistics NAACL 2022 Findings

    Jaehyung Seo, Seounghoon Lee, Chanjun Park, Yoonna Jang, Hyeonseok Moon, Sugyeong Eo, Seonmin Koo, Heuiseok Lim

International Journal

  • A Large-Scale Dataset for Korean Document-level Relation Extraction from Encyclopedia Texts

    Applied Intelligence, open access, 2024

    Suhyune Son, Jungwoo Lim, Seonmin Koo, Jinsung Kim, Younghoon Kim, Youngsik Lim, Dongseok Hyun, and Heuiseok Lim

  • A Multi-Faceted Exploration Incorporating Question Difficulty in Knowledge Tracing for English Proficiency Assessment

    Electronics, 12(19), 4171, 2023

    Jinsung Kim(*), Seonmin Koo(*), and Heuiseok Lim

  • Doubts on the reliability of parallel corpus filtering

    Expert Systems with Applications (ESWA), 233, 120962., 2023

    Hyeonseok Moon, Chanjun Park, Seonmin Koo, Jungseob Lee, Seungjun Lee, Jaehyung Seo, Sugyeong Eo, Yoonna Jang, Hyunjoong Kim, Hyoung-gyu Lee, Heuiseok Lim

  • Uncovering the Risks and Drawbacks Associated with the Use of Synthetic Data for Grammatical Error Correction

    IEEE Access, 2023

    Seonmin Koo, Chanjun Park, Seolhwa Lee, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim

  • A Survey on Evaluation Metrics for Machine Translation

    Mathematics, 2023

    Seungjun Lee, Jungseob Lee, Hyeonseok Moon, Chanjun Park, Jaehyung Seo, Sugyeong Eo, Seonmin Koo, Heuiseok Lim

  • K-NCT: Korean Neural Grammatical Error Correction Gold-Standard Test Set Using Novel Error Type Classification Criteria

    IEEE Acess, 2022

    Seonmin Koo(*), Chanjun Park(*), Jaehyung Seo, Seungjun Lee, Hyeonseok Moon, Jungseob Lee, Heuiseok Lim

Domestic Conference

  • Examining the Ability of Large Language Model on Entity-Based Korean Question-Answering (엔티티 기반 추론을 통한 거대 언어 모델의 한국어 질의응답 능력 연구)

    The 36th Annual Conference on Human & Cognitive Language Technology (HCLT 2024, 한글 및 한국어정보처리 학술대회)

    Seonmin Koo, Jinsung Kim, Chanjun Park, Kinam Park, and Heuiseok Lim

  • Exploring Korean Question Answering Ability of Large Language Model by Question Corruption (질의 변형에 따른 거대 언어모델의 한국어 질의응답 능력 연구)

    The 36th Annual Conference on Human & Cognitive Language Technology (HCLT 2024, 한글 및 한국어정보처리 학술대회)

    Jinsung Kim, Seonmin Koo, Kinam Park, and Heuiseok Lim

  • Examining the Feasibility of Utilizing a Large Language Model for Korean Grammatical Error Correction (한국어 맞춤법 교정을 위한 초거대 언어 모델의 잠재적 능력 탐색)

    The 35th Annual Conference on Human & Cognitive Language Technology (HCLT 2023, 한글 및 한국어정보처리 학술대회)

    Seonmin Koo, Chanjun Park, JeongBae Park, and Heuiseok Lim

  • Automatic Generation of Training Data for Korean Speech Recognition Post-Processor (한국어 음성인식 후처리기를 위한 학습 데이터 자동 생성 방안)

    The 34th Annual Conference on Human & Cognitive Language Technology (HCLT 2022, 한글 및 한국어정보처리 학술대회)

    Seonmin Koo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Sugyeong Eo, Yuna Hur, and Heuiseok Lim

  • KoCED: English-Korean Critical Error Detection Dataset (KoCED: 윤리 및 사회적 문제를 초래하는 기계번역 오류 탐지를 위한 학습 데이터셋)

    The 34th Annual Conference on Human & Cognitive Language Technology (HCLT 2022, 한글 및 한국어정보처리 학술대회)

    Sugyeong Eo, Suwon Choi, Seonmin Koo, Dahyun Jung, Chanjun Park, Jaehyung Seo, Hyeonseok Moon, Jeongbae Park, and Heuiseok Lim

  • Error Type Categorization for deep learning-based Korean Spelling Correction (딥러닝 기반 한국어 맞춤법 교정 연구를 위한 오류 유형 분류)

    Korea Software Congress 2021 (KSC2021, 한국소프트웨어종합학술대회)

    Seonmin Koo, Chanjun Park, and Heuiseok Lim

Domestic Journal

  • Classification and analysis of error types for deep learning-based Korean spelling correction (딥러닝 기반 한국어 맞춤법 교정을 위한 오류 유형 분류 및 분석)

    Journal of the Korea Convergence Society, 12(12), 65-74.

    Seonmin Koo, Chanjun Park, Aram So, and Heuiseok Lim

Projects

  • Hyundai Motor Company (2024-Present)
    • Development of TableQA model and framework
    • In collaboration with the Infotainment department of Hyundai Motor Company
  • KT AI/LLM Benchmark (2024-Present)
    • Building the LLM benchmark datasets in Korean language
    • Appropriate benchmark dataset evaluation method and performance evaluation
    • In collaboration with the KT
  • Miri Canvans Company(2024-2024)
    • Development of a model for extracting compatible elements
    • Model that identifies the most compatible images (elements) with surrounding design elements based on the elements returned by the input text
    • In collaboration with Miri Canvas
  • CYD ASR Post-Processing (2022-2024)
    • Generating post-processor data through an automatic parallel corpus generation methodology and a noisy method
    • Developing a AST post-processing that improves performance by correcting speech recognition results
  • NC Soft Persona Dialogue System (2023-2023)
    • Persona-grounded dialogue generation framework development
    • Personal information and dialogue history datastore construction w/ kNN method
    • kNN datastore + PLM inference (e.g., DialoGPT, GODEL)
    • Persona data augmentation w/ knowledge base and LLM
    • In collaboration with Language Understanding team, Dialogue team, and LLM team of NC Soft
  • KIGAM Mineral Deposit Prediction Project (2023-2023)
    • Development of Ni, Co, Li mineral deposit prediction model using AI
    • In collaboration with Korea Institute of Geoscience and Mineral Resources (KIGAM)
  • Naver Corporation (2022-2023)
    • Korean document-based relation extraction (RE) framework development
    • Semi-automated RE data construction method design
    • Model training w/ Naver encyclopedia
    • In collaboration with Naver Encyclopedia team
  • Naver Papago (2021-2022)
    • Development of high-performance parallel corpus filtering technology
    • A parallel corpus filtering methodology that automatically selects and removes data that is not suitable for training data
    • In collaboration with Naver Papago team

Awards and Honors

  • Received Korea University Best Paper Award 2023

A Little More About Me

  • Certificates
    • National Science and Technology Big Data Analysis Article (2021)
    • SQLD: Structured Query Language Developer (2020)
    • ADsP : Advanced Data Analytics Semi-Professional (2019)
  • Hobbies
    • Must-go Restaurant Tour
    • Taekwondo