Jaehyun Park

I am a PhD candidate in the Data Mining Lab at KAIST, advised by Prof. Jae-Gil Lee. My research interest is on data-centric AI for LLM post-training, studied along two axes: how we train on supervision so weak signals become useful learning, and where the supervision comes from (human, AI, or both).

Post-training data is naturally weak supervision.

We often describe LLMs as generative models, but in practice they predict the next token. For any prompt, infinite continuations may be valid, so any finite dataset shows only a subset of what “correct” is, making post-training data weakly supervised. My interest is in strengthening that weak signal by using what the model already knows.

Where humans fit in a world of synthetic data.

As synthetic data becomes more common, a key question remains: can we train LLMs without human input, and if not, where are humans still necessary? I’m interested in the differences between AI-generated and human supervision: the roles each plays, the mistakes each tends to make, and how each shapes model learning.

news

Feb 21, 2026	Our paper See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis is accepted at CVPR 2026! 🎉🥳
Jan 21, 2026	Our paper Random Is Hard to Beat: Active Selection in online DPO with Modern LLMs is accepted at ICLR 2026 Workshop ICBINB! 🎉🥳
Mar 01, 2025	I am starting my Ph.D. Program at Data Mining Lab @ KAIST.
Feb 26, 2025	I am starting my research scientist internship at Krafton AI.
Jan 22, 2025	Our paper Active Learning for Continual Learning: Keeping the Past Alive in the Present is accepted at ICLR 2025! 🎉🥳

publications

CVPR

See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis

Jaehyun Park^* , Minyoung Ahn^*, Minkyu Kim, and 3 more authors

In The IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026

PDF Code Website
ICLRw

Random Is Hard to Beat: Active Selection in online DPO with Modern LLMs

Giyeong Oh, Junghyun Lee, Jaehyun Park , and 3 more authors

In The International Conference on Learning Representations Workshop, 2026

PDF
ICLR

Active Learning for Continual Learning: Keeping the Past Alive in the Present

Jaehyun Park , Dongmin Park, and Jae-Gil Lee^†

In The International Conference on Learning Representations, 2025
NeurIPS

Exploiting Representation Curvature for Boundary Detection in Time Series

Yooju Shin, Jaehyun Park , Susik Yoon, and 3 more authors

In Advances in Neural Information Processing Systems, 2024

PDF