Ashutosh Chaubey

About Me

I am a CS PhD student at the Institute for Creative Technologies, University of Southern California, where I am advised by Prof. Mohammad Soleymani at the Intelligent Human Perception Lab. I am a Bronze Medallist from the 2021 batch of Indian Institute of Technology, Roorkee.

During my PhD, I am working on post-training techniques such as preference optimization for multimodal (audio/video/omni) LLMs to improve their social and emotion understanding. I also collaborate on projects related to video generation for social behaviors.

Prior to this, I was a Founding Research Engineer at Anoki AI where I worked on multimodal content understanding and retrieval. I have also worked at LG Ad Solutions on speaker recognition, automatic content recognition using audio and voice cloning. Over the past I have interned at Adobe Research, with Sumit Shekhar, at Vision and AI Lab, IISc. Bengaluru with Prof. R. Venkatesh Babu, and at IIT Roorkee with Prof. R. Balasubramanian

I am looking for Research/Applied Scientist internship positions for Summer 2026 on multimodal LLMs (audio/visual/omni) and video generation. Please reach out if you have open positions.

Masters/Undergrad Students

If you are a student and want to have a discussion with me regarding my papers or how to apply for a PhD program in the US, please email me at achaubey at usc dot edu

For students who wish to join our lab, please check our lab's open positions.

Areas of Interest

Multimodal LLM Tuning and Post-training, Emotion understanding, Social AI

News

Feb '26 - MoD-DPO accepted at CVPR 2026! Denver, here I come!
Jan '26 - AVERE accepted at ICLR 2026! See you in Rio de Janeiro! 🇧🇷
Dec '25 - LibreFace 2.0 accepted at FG 2026. See you in Kyoto! 🇯🇵
Sep '25 - Face-LLaVA accepted at WACV 2026 in Round-1 early acceptance. See you in Tucson!
Aug '25 - One paper accepted at EMNLP 2025 (Findings). Preprint available here.
Jun '25 - DiTaiListener accepted at ICCV 2025. See you in Hawaai!
Oct '24 - One paper accepted at WACV 2025! Congrats to my previous team at Anoki AI!
Aug '24 - I joined the Intelligent Human Perception Lab @ Institute for Creative Technologies, USC.
Apr '24 - I will be starting my PhD @ University of Southern California starting this Fall. Fight on!
Sep '23 - One paper has been accepted at ASRU, 2023. See you in Taipei!
Apr '23 - I have joined Anoki AI as a Founding Research Engineer.
Sep '22 - I will be at Interspeech 2022 at Incheon, Korea.
Jun '22 - One paper has been accepted at Interspeech 2022!
Jul '21 - I have started my industry experience by joining LG Ad Solutions as a Data Scientist. On to new challenges!

Research & Publications

MoD-DPO: Towards Mitigating Cross-modal Hallucinations in Omni LLMs using Modality Decoupled Preference Optimization

Ashutosh Chaubey, Jiacheng Pang, Mohammad Soleymani

CVPR, 2026

Arxiv / Project Page

Multimodal LLMs Computer Vision Reinforcement Learning Audio

AVERE: Improving Audiovisual Emotion Reasoning with Preference Optimization

Ashutosh Chaubey, Jiacheng Pang, Maksim Siniukov, Mohammad Soleymani

ICLR, 2026

Paper / Arxiv / Code / Weights / Benchmark / Project Page

Multimodal LLMs Computer Vision Reinforcement Learning Audio

LibreFace 2.0 : Leveraging Large-Scale Synthetic Data for Fair and Generalizable Facial Analysis

Xulang Guan*, Ashutosh Chaubey*, Maksim Siniukov, Belle Hsieh, Zongjian Li, Mohammad Soleymani

FG, 2026 (Round 1)

Arxiv (Soon!) / Code

Computer Vision Face Analysis

Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning

Ashutosh Chaubey, Xulang Guan, Mohammad Soleymani

WACV, 2026 (Round 1)

Paper / Arxiv / Code / Weights / Dataset / Project Page

Multimodal LLMs Computer Vision Face Analysis

Can VLMs Recall Factual Associations From Visual References?

Dhananjay Ashok, Ashutosh Chaubey, Hirona Arai, Jonathan May, Jesse Thomason

EMNLP, 2025 (Findings)

Paper / Arxiv / Code

Multimodal LLMs Computer Vision

DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion

Maksim Siniukov*, Di Chang*, Minh Tran, Hongkun Gong, Ashutosh Chaubey, Mohammad Soleymani

ICCV, 2025

Paper / Arxiv / Project Page

Multimodal LLMs Computer Vision Audio Video Generation

ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising

Ashutosh Chaubey, Anoubhav Agarwaal, Sartaki Sinha Roy, Aayush Agrawal, Susmita Ghose

WACV, 2025

Paper / Arxiv / Poster

Multimodal LLMs Computer Vision

Meta-Learning Framework for End-to-End Imposter Identification in Unseen Speaker Recognition

Ashutosh Chaubey, Sparsh Sinha, Susmita Ghose

IEEE ASRU, 2023

Paper / Poster

Audio

Improved Relation Networks for End-to-End Speaker Verification and Identification

Ashutosh Chaubey, Sparsh Sinha, Susmita Ghose

Interspeech, 2022

Paper / Poster

Audio

OPAD: An Optimized Policy-based Active Learning Framework for Document Content Analysis

Sumit Shekhar, Bhanu Prakash Reddy Guda, Ashutosh Chaubey, Ishan Jindal, Avneet Jain

CVPR Workshops, 2022

Paper / Patent

Reinforcement Learning Computer Vision

Universal Adversarial Perturbations: A Survey

Ashutosh Chaubey*, Nikhil Agrawal*, Kavya Barnwal, Keerat K. Guliani, Pramod Mehta

Survey paper, arXiv 2020

Paper

Computer Vision

Education

University of Southern California – PhD, Computer Science (2024 - Present), GPA: 4.0/4.0

Graduate Researcher – Intelligent Human Perception Lab, Institute for Creative Technologies

Indian Institute of Technology Roorkee – BS, Computer Science (2017 - 2021), GPA: 9.718/10

Chair - ACM IIT Roorke Chapter | Co-President - Vision and Language Group

Academic Duties

Reviewer

ECCV (2026)
ICML (2026)
CVPR (2026)
BMVC (2026)
ICCV (2025)
WACV (2026)
FG (2026)

Teaching Assistant

CSCI 561 – Foundations of Artificial Intelligence (USC, Fall 2025)
CSCI 535 – Multimodal Probabilistic Learning of Human Communication (USC, Spring 2026)