Publications


Back to Home

Research & Publications

For the most up-to-date list, please visit my Google Scholar profile.

* denotes equal contribution. My name is in bold.

Filter by topic:

2026

  1. Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox
    Jiacheng Pang*, Ashutosh Chaubey*, Mohammad Soleymani
    ICML, 2026
    Multimodal LLMs Audio
  2. Push, Pop, Parallelize: Stack-Augmented Linear Attention via the Delta Rule
    Anh T Nguyen, Saleh Momeni, Ashutosh Chaubey, Changnan Xiao, Bing Liu
    ICML, 2026
    Multimodal LLMs
  3. MoD-DPO: Towards Mitigating Cross-modal Hallucinations in Omni LLMs using Modality Decoupled Preference Optimization
    Ashutosh Chaubey, Jiacheng Pang, Mohammad Soleymani
    CVPR, 2026
    Multimodal LLMs Computer Vision Reinforcement Learning Audio
  4. AVERE: Improving Audiovisual Emotion Reasoning with Preference Optimization
    Ashutosh Chaubey, Jiacheng Pang, Maksim Siniukov, Mohammad Soleymani
    ICLR, 2026
    Multimodal LLMs Computer Vision Reinforcement Learning Audio
  5. LibreFace 2.0: Leveraging Large-Scale Synthetic Data for Fair and Generalizable Facial Analysis
    Xulang Guan*, Ashutosh Chaubey*, Maksim Siniukov, Belle Hsieh, Zongjian Li, Mohammad Soleymani
    FG, 2026 (Round 1)
    Computer Vision Face Analysis
  6. Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning
    Ashutosh Chaubey, Xulang Guan, Mohammad Soleymani
    WACV, 2026 (Round 1)
    Multimodal LLMs Computer Vision Face Analysis
  7. Reasoning Improves Human Alignment in LLM Judgment and Choice
    Ala N. Tak, Amin Banayeeanzade, Anahita Bolourani, Fatemeh Bahrani, Ashutosh Chaubey, Sai Praneeth Karimireddy, Norbert Schwarz, Jonathan Gratch
    ICLR 2026 Workshop on Representational Alignment (Re^4-Align)
    Multimodal LLMs

2025

  1. Can VLMs Recall Factual Associations From Visual References?
    Dhananjay Ashok, Ashutosh Chaubey, Hirona Arai, Jonathan May, Jesse Thomason
    EMNLP (Findings), 2025
    Multimodal LLMs Computer Vision
  2. DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion
    Maksim Siniukov*, Di Chang*, Minh Tran, Hongkun Gong, Ashutosh Chaubey, Mohammad Soleymani
    ICCV, 2025
    Computer Vision Audio Video Generation
  3. ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising
    Ashutosh Chaubey, Anoubhav Agarwaal, Sartaki Sinha Roy, Aayush Agrawal, Susmita Ghose
    WACV, 2025
    Multimodal LLMs Computer Vision

2023

  1. Meta-Learning Framework for End-to-End Imposter Identification in Unseen Speaker Recognition
    Ashutosh Chaubey, Sparsh Sinha, Susmita Ghose
    IEEE ASRU, 2023
    Audio

2022

  1. Improved Relation Networks for End-to-End Speaker Verification and Identification
    Ashutosh Chaubey, Sparsh Sinha, Susmita Ghose
    Interspeech, 2022
    Audio
  2. OPAD: An Optimized Policy-based Active Learning Framework for Document Content Analysis
    Sumit Shekhar, Bhanu Prakash Reddy Guda, Ashutosh Chaubey, Ishan Jindal, Avneet Jain
    CVPR Workshops, 2022
    Reinforcement Learning Computer Vision

2020

  1. Universal Adversarial Perturbations: A Survey
    Ashutosh Chaubey*, Nikhil Agrawal*, Kavya Barnwal, Keerat K. Guliani, Pramod Mehta
    Survey paper, arXiv 2020
    Computer Vision

2019

  1. A Generative Adversarial Network Based Ensemble Technique for Automatic Evaluation of Machine Synthesized Speech
    Ashutosh Chaubey*, Jaynil Jaiswal*, Sasi Kiran Reddy Bhimvarapu, Shashank Kashyap, Puneet Kumar, Balasubramanian Raman, Partha Pratim Roy
    ACPR, 2019
    Audio

Preprints

  1. GDPO-Listener: Expressive Interactive Head Generation via Auto-Regressive Flow Matching and Group reward-Decoupled Policy Optimization
    Zhangyu Jin, Maksim Siniukov, Deuksin Kwon, Ashutosh Chaubey, Mohammad Soleymani
    arXiv preprint, 2026
    Computer Vision Reinforcement Learning Video Generation
  2. Sparks of Rationality: Do Reasoning LLMs Align with Human Judgment and Choice?
    Ala N. Tak, Amin Banayeeanzade, Anahita Bolourani, Fatemeh Bahrani, Ashutosh Chaubey, Sai Praneeth Karimireddy, Norbert Schwarz, Jonathan Gratch
    arXiv preprint, 2026
    Multimodal LLMs