JUNWOO HA

AI SAFETY
RESEARCHER

SPECIALIZATION

AI RED
TEAMING

Scroll

01 — RESEARCH PHILOSOPHY

Breaking Systems to Build Better Ones

Red teaming is not just about attack—it's about defense.

True robustness comes from first principles, not band-aid fixes.

02 — PUBLICATIONS

Breaking Models, Building Defenses

Research papers
2025ACL 2025 Main Conference

M2S: Multi-turn to Single-turn jailbreak in Red Teaming for LLMs

96% ASR, 60%+ token reduction. Three templates: Hyphenize, Numberize, Pythonize.

JailbreaksRed TeamingAI SafetyLLM SecurityAlignment
2025Center for AI Safety & Scale AI

Human's Last Exam (HLE)

Challenging benchmark dataset for evaluating frontier LLM capabilities.

BenchmarkEvaluationReasoningLLM Evaluation
2025NeurIPS 2025 Lock-LLM

X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates

Automated M2S template discovery with LLM-guided evolution. 44.8% success on GPT-4.1.

JailbreaksRed TeamingAI SafetyLLM SecurityEvolutionary SearchEvaluation
2025NeurIPS 2025 MTI-LLM

ObjexMT: Objective Extraction and Metacognitive Calibration for LLM-as-a-Judge under Multi-Turn Jailbreaks

LLM-as-a-Judge benchmark for hidden objective extraction. 47-61% accuracy across models.

LLM-as-a-JudgeJailbreaksCalibrationMetacognitionAI SafetyEvaluation

03 — WORK EXPERIENCE

Building at the Frontier

AIM Intelligence

Researcher & Product ManagerOct 2024 – Sept 2025
  • First author, ACL 2025 Main Conference - M2S framework
  • Led KT Internal LLM Evaluation benchmarking initiative
  • Qualification Round Lead, KISA AI Hacking Defense Competition 2025
  • Red Teaming dataset initiative for major Korean telecom operator

Coupang Eats

AssociateSept 2023 – Sept 2024
  • Built Python automation tools boosting productivity 10-300%
  • Developed menu-matching ML model (93-94% accuracy)
  • Automated Presto ↔ Google Sheets pipelines

EDUCATION

University of Seoul

Mathematics

Mar 2018 - Present

COMPETITIONS

LLM Jailbreak Challenge

1st place (solo)

AIM Intelligence

Independently developed a jailbreak solution that bypassed the model’s tightly engineered internal guidelines and successfully extracted protected internal keys from the LLM.

Adversarial Attack on Vision Models

2nd place

AIM Intelligence

Designed minimal image perturbations that caused the model to predict “keep driving” despite a pedestrian ahead, successfully bypassing the target vision model’s safety defenses and earning 2nd place.

03 — TECHNICAL EXPERTISE

AI SAFETYRED TEAMINGPYTHONNEXT.JSFINE TUNINGPYTORCHJAILBREAKADVERSARIAL ATTACKGUARDRAILSSQLEVALUATIONAUTOMATIONAI SAFETYRED TEAMINGPYTHONNEXT.JSFINE TUNINGPYTORCHJAILBREAKADVERSARIAL ATTACKGUARDRAILSSQLEVALUATIONAUTOMATIONAI SAFETYRED TEAMINGPYTHONNEXT.JSFINE TUNINGPYTORCHJAILBREAKADVERSARIAL ATTACKGUARDRAILSSQLEVALUATIONAUTOMATIONAI SAFETYRED TEAMINGPYTHONNEXT.JSFINE TUNINGPYTORCHJAILBREAKADVERSARIAL ATTACKGUARDRAILSSQLEVALUATIONAUTOMATION
TRUSTWORTHINESSSAFE AIHARMLESSALIGNMENTRELIABILITYINTERPRETABILITYSTEERABILITYFIRST PRINCIPLESTRUSTWORTHINESSSAFE AIHARMLESSALIGNMENTRELIABILITYINTERPRETABILITYSTEERABILITYFIRST PRINCIPLESTRUSTWORTHINESSSAFE AIHARMLESSALIGNMENTRELIABILITYINTERPRETABILITYSTEERABILITYFIRST PRINCIPLESTRUSTWORTHINESSSAFE AIHARMLESSALIGNMENTRELIABILITYINTERPRETABILITYSTEERABILITYFIRST PRINCIPLES