Publications – Ashutosh Chaubey

Research & Publications

For the most up-to-date list, please visit my Google Scholar profile.

* denotes equal contribution. My name is in bold.

2026

Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox

Jiacheng Pang*, Ashutosh Chaubey*, Mohammad Soleymani

ICML, 2026

Preprint (Coming Soon) / Project Page (Coming Soon)

Multimodal LLMs Audio
Push, Pop, Parallelize: Stack-Augmented Linear Attention via the Delta Rule

Anh T Nguyen, Saleh Momeni, Ashutosh Chaubey, Changnan Xiao, Bing Liu

ICML, 2026

Preprint (Coming Soon) / Project Page (Coming Soon)

Multimodal LLMs
MoD-DPO: Towards Mitigating Cross-modal Hallucinations in Omni LLMs using Modality Decoupled Preference Optimization

Ashutosh Chaubey, Jiacheng Pang, Mohammad Soleymani

CVPR, 2026

arXiv / Project Page

Multimodal LLMs Computer Vision Reinforcement Learning Audio
AVERE: Improving Audiovisual Emotion Reasoning with Preference Optimization

Ashutosh Chaubey, Jiacheng Pang, Maksim Siniukov, Mohammad Soleymani

ICLR, 2026

Paper / arXiv / Code / Weights / Benchmark / Project Page

Multimodal LLMs Computer Vision Reinforcement Learning Audio
LibreFace 2.0: Leveraging Large-Scale Synthetic Data for Fair and Generalizable Facial Analysis

Xulang Guan*, Ashutosh Chaubey*, Maksim Siniukov, Belle Hsieh, Zongjian Li, Mohammad Soleymani

FG, 2026 (Round 1)

Code

Computer Vision Face Analysis
Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning

Ashutosh Chaubey, Xulang Guan, Mohammad Soleymani

WACV, 2026 (Round 1)

Paper / arXiv / Code / Weights / Dataset / Project Page

Multimodal LLMs Computer Vision Face Analysis
Reasoning Improves Human Alignment in LLM Judgment and Choice

Ala N. Tak, Amin Banayeeanzade, Anahita Bolourani, Fatemeh Bahrani, Ashutosh Chaubey, Sai Praneeth Karimireddy, Norbert Schwarz, Jonathan Gratch

ICLR 2026 Workshop on Representational Alignment (Re^4-Align)

Paper

Multimodal LLMs

2025

Can VLMs Recall Factual Associations From Visual References?

Dhananjay Ashok, Ashutosh Chaubey, Hirona Arai, Jonathan May, Jesse Thomason

EMNLP (Findings), 2025

Paper / arXiv / Code

Multimodal LLMs Computer Vision
DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion

Maksim Siniukov*, Di Chang*, Minh Tran, Hongkun Gong, Ashutosh Chaubey, Mohammad Soleymani

ICCV, 2025

Paper / arXiv / Project Page

Computer Vision Audio Video Generation
ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising

Ashutosh Chaubey, Anoubhav Agarwaal, Sartaki Sinha Roy, Aayush Agrawal, Susmita Ghose

WACV, 2025

Paper / arXiv / Poster

Multimodal LLMs Computer Vision

2023

Meta-Learning Framework for End-to-End Imposter Identification in Unseen Speaker Recognition

Ashutosh Chaubey, Sparsh Sinha, Susmita Ghose

IEEE ASRU, 2023

Paper / Poster

Audio

2022

Improved Relation Networks for End-to-End Speaker Verification and Identification

Ashutosh Chaubey, Sparsh Sinha, Susmita Ghose

Interspeech, 2022

Paper / Poster

Audio
OPAD: An Optimized Policy-based Active Learning Framework for Document Content Analysis

Sumit Shekhar, Bhanu Prakash Reddy Guda, Ashutosh Chaubey, Ishan Jindal, Avneet Jain

CVPR Workshops, 2022

Paper / Patent

Reinforcement Learning Computer Vision

2020

Universal Adversarial Perturbations: A Survey

Ashutosh Chaubey*, Nikhil Agrawal*, Kavya Barnwal, Keerat K. Guliani, Pramod Mehta

Survey paper, arXiv 2020

Paper

Computer Vision

2019

A Generative Adversarial Network Based Ensemble Technique for Automatic Evaluation of Machine Synthesized Speech

Ashutosh Chaubey*, Jaynil Jaiswal*, Sasi Kiran Reddy Bhimvarapu, Shashank Kashyap, Puneet Kumar, Balasubramanian Raman, Partha Pratim Roy

ACPR, 2019

Paper

Audio

Preprints

GDPO-Listener: Expressive Interactive Head Generation via Auto-Regressive Flow Matching and Group reward-Decoupled Policy Optimization

Zhangyu Jin, Maksim Siniukov, Deuksin Kwon, Ashutosh Chaubey, Mohammad Soleymani

arXiv preprint, 2026

arXiv

Computer Vision Reinforcement Learning Video Generation
Sparks of Rationality: Do Reasoning LLMs Align with Human Judgment and Choice?

Ala N. Tak, Amin Banayeeanzade, Anahita Bolourani, Fatemeh Bahrani, Ashutosh Chaubey, Sai Praneeth Karimireddy, Norbert Schwarz, Jonathan Gratch

arXiv preprint, 2026

arXiv

Multimodal LLMs

Research & Publications

Filter by topic:

2026

2025

2023

2022

2020

2019

Preprints