Minh Duc (David) Chu

PhD candidate at the USC Information Sciences Institute, advised by Luca Luceri (SIGNALS Lab) and Kristina Lerman.

Anthropic AI Safety Fellow (Summer 2026).

Incoming Research Engineer Intern — AI Safety & Alignment, Character.AI (Fall 2026).

From Vũng Tàu, Việt Nam 🇻🇳. Based in Los Angeles, CA 🇺🇸.

About

I'm a Computer Science PhD candidate at the USC Information Sciences Institute, advised by Luca Luceri (SIGNALS Lab) and Kristina Lerman. I'm currently an Anthropic AI Safety Fellow, and this fall I'll join Character.AI as a Research Engineer Intern on the AI Safety & Alignment team.

I started at USC in 2023 and expect to defend in late 2027. Before that, I earned my B.A. with Distinction in Computer Science & Statistics from Carleton College in Minnesota.

My research sits where AI safety and alignment meet mental health. One side is behavioral, a black-box audit of how conversational AI acts over many turns, and how those interactions can compound into harm: emotional dependency, belief spirals, and eating-disorder reinforcement, especially for teens and other vulnerable users. With clinicians and social scientists, I use concrete evaluation, red-teaming harnesses, and post-training safeguards for these risks.

The other side is more fundamental, and the question I most want to answer: for AI to harm people less, to actively mitigate harm, and maybe even leave people better off, especially around emotional manipulation and mental health, it first has to understand the complex psychological and social constructs we live by, like emotions, intimacy, and mental health itself. I work toward this from several angles: mechanistic interpretability, conceptual reasoning, behavioral study, and collaboration with psychologists and psychiatrists.

I grew up in Vũng Tàu on the coast of southern Việt Nam, and now live in Los Angeles. Outside research I box and play tennis.

Research

I work on AI alignment and safety, with a current focus on the shift from AI-as-assistant to AI-as-companion.

AI Alignment & SafetySocio-technical AlignmentHuman–AI CompanionshipModel PsychologyModel WelfareInterpretabilityCharacter TrainingSocial NLP

01
Assistant → Companion
What changes about safety when people stop using LLMs and start confiding in them.
02
Psychology, Welfare & Interpretability
Raising a model's EQ: the traits, drives, and failure modes models develop, how complex constructs get encoded as directions inside them, and how the way we treat models may carry downstream weight.
Currently: Part 2 of Anthropic's functional emotion vectors
03
Character Training
How voice, values, and refusals get baked in at scale.
Fall 2026: at Character.AI, AI Safety & Alignment team
04
Aligning to Communities
Tuning LLMs to specific online communities without flattening their language or norms.
05
Computational Social Science
Surfacing harm patterns — body image, eating disorders — across Twitter, Reddit, TikTok.

Skills

The tools I reach for — and, for language models, the four phases I actually work across.

Programming

PythonC++CJavaJavaScriptCUDADocker

Machine Learning

PyTorchHugging FaceTensorFlowKerasscikit-learnOpenCVR

Statistics

Bayesian InferenceProbabilityTime SeriesSpatial StatisticsSamplingVisualization

Language Models

01

Training

SFTRLHFRLAIFPreference Optimization (DPO, GRPO)Constitutional AICharacter TrainingDistributed Training (FSDP, DeepSpeed)Multimodal / VLMs

02

Interpretability

Sparse AutoencodersActivation SteeringLinear ProbingLogit LensRepresentation GeometryCircuit & Attention AnalysisFeature Attribution

03

Evaluation & Oversight

LLM-as-a-JudgeRed-teamingBehavioral & Psychometric EvalsScalable Oversight (Debate, Weak-to-Strong)Inspect AIPetri

04

Simulation

Multi-Agent SystemsPersona & Community SimulationAgent-based Info-OpsAutoGenRAG

News

Last updated · Jul 2026

Publications

Google Scholar

01
2026
When Chatbots Accommodate: What AI Companions Optimize for in Vulnerable Conversations
Minh Duc Chu, Yifan Wu, Zhiyi Chen, Angel Hsing-Chi Hwang, Luca Luceri
arXiv preprint
02
2026
BigTokDetect: A Clinically-Informed Vision–Language Modeling Framework for Detecting Pro-Bigorexia Videos on TikTok
Minh Duc Chu, Kshitij Pawar, Zihao He, Roxanna Sharifi, Ross M. Sonnenblick, Magdalayna Curry, Laura D'Adamo, Lindsay Young, Stuart Murray, Kristina Lerman
EACL 2026
03
2026
Tied In on TikTok: Tie Strength and Emotional Dynamics in Algorithmic Communities
Charles Bickham, Minh Duc Chu, Arianna Yuan, Valerie Lookingbill, Ehsan Mohammadi, Stuart Murray, Kristina Lerman, Emilio Ferrara
ICWSM 2026
04
2026
Illusions of Intimacy: How Emotional Dynamics Shape Human–AI Relationships
Minh Duc Chu, Patrick Gerard, Kshitij Pawar, Charles Bickham, Kristina Lerman
ACII 2026
05
2025
Improving and Assessing the Fidelity of Large Language Models Alignment to Online Communities
Minh Duc Chu, Zihao He, Rebecca Dorn, Kristina Lerman
NAACL 2025
06
2024
Community-Cross-Instruct: Unsupervised Instruction Generation for Aligning Large Language Models to Online Communities
Zihao He, Minh Duc Chu, Rebecca Dorn, Siyi Guo, Kristina Lerman
EMNLP 2024

Services

Reviewing, organizing, and teaching across the NLP and computational social science communities.

Program Committee / Reviewer

NAACL ACL EMNLP AACL EACL CSCW WWW ICWSM ACII IJHCS (journal)

→ICWSM '26 — Local Chair

Teaching

Contact

Happy to hear from anyone working on AI alignment, model welfare, or social NLP — and from anyone in Los Angeles looking for a tennis partner.

Email

mhchu [at] usc [dot] edu(USC)

dchu [at] isi [dot] edu(ISI)

davidchu11381 [at] gmail [dot] com(Personal)

Elsewhere

Google Scholar

ACL Anthology

GitHub

Minh Duc (David) Chu

About

Research

Assistant → Companion

Psychology, Welfare & Interpretability

Character Training

Aligning to Communities

Computational Social Science

Skills

Language Models

Training

Interpretability

Evaluation & Oversight

Simulation

News

Publications

When Chatbots Accommodate: What AI Companions Optimize for in Vulnerable Conversations

BigTokDetect: A Clinically-Informed Vision–Language Modeling Framework for Detecting Pro-Bigorexia Videos on TikTok

Tied In on TikTok: Tie Strength and Emotional Dynamics in Algorithmic Communities

Illusions of Intimacy: How Emotional Dynamics Shape Human–AI Relationships

Improving and Assessing the Fidelity of Large Language Models Alignment to Online Communities

Community-Cross-Instruct: Unsupervised Instruction Generation for Aligning Large Language Models to Online Communities

Services

Contact