Despite recent advances in diffusion models, AI-generated images still often contain
visual artifacts that compromise realism. Although more thorough
pre-training and bigger models might reduce artifacts, there is no assurance that they
can be completely eliminated, making artifact mitigation a highly crucial area of study.
Previous artifact-aware methodologies depend on human-labeled artifact datasets, which
are costly and difficult to scale. We propose ArtiAgent, an agentic
framework that efficiently creates pairs of real and artifact-injected images without
human intervention. It comprises three agents: a perception agent that
recognizes and grounds entities and subentities from real images; a synthesis
agent that introduces artifacts via novel patch-wise embedding manipulation within
a diffusion transformer; and a curation agent that filters synthesized
artifacts and generates local and global explanations for each instance.
Using ArtiAgent, we synthesize 100K images with rich artifact
annotations and demonstrate both efficacy and versatility across diverse applications,
including fine-tuning open-source VLMs that consistently outperform proprietary systems
(GPT-5, Gemini-2.5-Pro) on artifact detection, localization, and explanation.