Chirag Agarwal

340 posts

Chirag Agarwal banner
Chirag Agarwal

Chirag Agarwal

@_cagarwal

Assistant Professor @UVA; PI of Aikyam Lab; Prev - @Harvard, @Adobe @BoschGlobal @thisisUIC ; Increasing the sample size of my thoughts

Katılım Kasım 2013
570 Takip Edilen1.9K Takipçiler
Sabitlenmiş Tweet
Chirag Agarwal
Chirag Agarwal@_cagarwal·
Excited to share CURE-Med, our new work on making LLMs reliable for medical reasoning across the different languages🌍 In healthcare, models can’t just be great in English and then collapse when deployed in new languages. They need to adapt to new tasks (languages, dialects, medical contexts) without catastrophic forgetting of what you already know. We tackle this head-on with curriculum-informed RL. Datasets and Models on HuggingFace: huggingface.co/Aikyam-Lab Website: cure-med.github.io
English
1
0
7
530
Chirag Agarwal
Chirag Agarwal@_cagarwal·
Special shoutout to @EricOnyame and Akash Ghosh for leading this work and thanks to the amazing co-authors Subhadip Baidya, Xiuying Chen (@mbzuai), and Sriparna Saha (@iitpatna).
English
0
0
0
73
Chirag Agarwal
Chirag Agarwal@_cagarwal·
Our scaling results from 1.5B to 32B shows consistent improvements, where Cure-Med outperforms medical LLMs on OOD benchmarks and human evaluations confirm robustness!
Chirag Agarwal tweet media
English
1
0
2
113
Chirag Agarwal
Chirag Agarwal@_cagarwal·
Excited to share CURE-Med, our new work on making LLMs reliable for medical reasoning across the different languages🌍 In healthcare, models can’t just be great in English and then collapse when deployed in new languages. They need to adapt to new tasks (languages, dialects, medical contexts) without catastrophic forgetting of what you already know. We tackle this head-on with curriculum-informed RL. Datasets and Models on HuggingFace: huggingface.co/Aikyam-Lab Website: cure-med.github.io
English
1
0
7
530
Chirag Agarwal retweetledi
Yanran Li
Yanran Li@Lyric97243419·
Do VLMs actually 'see' or just rely on priors? Fascinating talk by Chirag Agarwal at #AAAI2026 on Trustworthy Multimodal AI. He showed how models fail to count stripes on a shoe simply because they recognize the 'Adidas' logo and hallucinate the standard 3 stripes. 👟 @RealAAAI
Yanran Li tweet mediaYanran Li tweet mediaYanran Li tweet mediaYanran Li tweet media
English
3
9
89
5.7K
Chirag Agarwal retweetledi
Anh Totti Nguyen
Anh Totti Nguyen@anh_ng8·
As LLMs can take higher-stake questions, the verification duty on users grows! Often long-text responses challenge users in parsing & detecting errors🤯 💡Ask LLMs to generate responses in an ▻ interactive ◄ web app that is also LLM-generated to aid user verification! 👩🏻‍💻
GIF
Chirag Agarwal@_cagarwal

Excited to share our new work on verifying LLM reasoning! Everyone loves Chain-of-Thought (CoT). LLMs can generate amazing, step-by-step solutions (called CoT). But when they make a mistake, can a human actually find it quickly? The answer is: No, not easily.

English
1
1
7
588
Chirag Agarwal
Chirag Agarwal@_cagarwal·
Looking forward to attending #AAAI2026 next week! Honored to be part of the New Faculty Highlights program, where I'll speak on "Advancing Trust in Multimodal AI". We also have an Oral presentation on polarity-aware probing for quantifying latent alignment in LLMs. Excited to share our findings on better understanding and aligning model behavior. Let's chat if you're there! Paper: arxiv.org/abs/2511.21737
English
1
0
10
228
Chirag Agarwal retweetledi
Guide Labs
Guide Labs@guidelabsai·
Excited to announce our $9M USD seed round led by @Initialized and the first large-scale interpretable LLM: an 8 billion parameter model capable of explaining its outputs through mechanisms humans can actually understand.
English
5
17
41
9.8K
Chirag Agarwal
Chirag Agarwal@_cagarwal·
“I have a large, aligned, and safe model” — No One. Had a great time speaking about Robust Unsupervised Probing Frameworks to evaluate the alignment of language models in the context of AI Safety. Paper: arxiv.org/pdf/2511.21737
Chirag Agarwal tweet mediaChirag Agarwal tweet media
FAR.AI@farairesearch

Day 1 of the San Diego Alignment Workshop on frontier alignment, evals, control and mech interp. @Yoshua_Bengio @ARGleave @sleepinyourhat @majatrebacz @AnkaReuel @daniel_d_kang @adamfungi @_cagarwal @natashajaques and more. Day 2 tomorrow! 👇 Recordings coming soon. Follow us to stay updated! In the meantime, check out past talks on YouTube: buff.ly/KxksFNJ

English
0
1
17
1.4K
Chirag Agarwal
Chirag Agarwal@_cagarwal·
Heading to NeurIPS next week!! Looking forward to present our latest work on understanding latent alignment in multimodal LLMs, and make new collaborations. Please join us at the Regulatable ML Workshop on December 7th. This is a crucial area, and we are excited to bring the community together to discuss the path forward for compliant ML deployment. Also, @uvadatascience is hiring for PhD students and Faculty Positions. If you are on the job market or know someone who is, please feel free to reach out. Happy to share details on hiring, admissions, and the exciting research we are doing. Link: regulatableml.github.io
English
0
1
8
356
Chirag Agarwal retweetledi
Gautam Kamath
Gautam Kamath@thegautamkamath·
I am recruiting PhD students at @NYU_Courant to conduct research in learning theory, algorithmic statistics, and trustworthy machine learning, starting Fall 2026. Please share widely! Deadline to apply is December 12, 2025.
Gautam Kamath tweet media
English
11
131
582
80.3K
Chirag Agarwal
Chirag Agarwal@_cagarwal·
My first AC experience at ICLR has been eye-opening. In the relentless race to publish more papers, I realize we (authors and reviewers) are losing our love for research and forgetting that everyone is both an author and a reviewer.
Peter Richtarik@peter_richtarik

I am an AC for ICLR 2026. One of the papers in my batch was just withdrawn. The authors wrote a brief response, explaining why the reviewers failed at their job. I agree with most of their comments. The authors gave up. They are fed up. Just like many of us. I understand. We pretend the emperor has clothes, but he is naked. Here is the final part of their withdrawal notice. I took the liberty to make it public, to highlight that what we are doing with AI conference reviews these last few years is, basically, madness. --- Comment: We thank the reviewers for their time. However, upon reading the reviews for our paper, it became immediately apparent that the four "reject" ratings are not based on good-faith academic disagreement, but on a critical failure to read the submitted paper. The reviews are rife with demonstrably false claims that are directly contradicted by the text. The core justifications for rejection rely on asserting that key components are "missing" when they are explicitly detailed in the manuscript. Some specific examples are (and many are even fake claims). Claim: Harder tasks like GSM8K are missing. Fact: GSM8K results are in many tables, like Table 2 (Section 4.2) and Appendix G. Claim: The method does not use per-layer ranks. Fact: This is the entire point of our method. The reviewer clearly mistook our method for the baselines. (Section 2, Table 1). Claim: The GP kernel is not specified. Fact: It is specified in Appendix E (Table 6). Claim: There is no ablation of the method's three stages. Fact: Section 4.4 ("Ablation Study") and Appendix J are dedicated to this. Reviewers have a fundamental responsibility to read and evaluate the work they are assigned. The nature of these errors is so fundamental, so systemic in overlooking explicit content, that it goes far beyond what "limited time" or "oversight" can explain. This work has gone through several rounds of revision over the last year. In earlier submissions, the paper usually received borderline or weak-accept scores. Numerous signs strongly suggest that some reviewers are relying entirely on AI tools to automatically generate peer reviews, rather than fulfilling their fundamental responsibility of personally reading and evaluating manuscripts. We strongly protest this. This is a gross disrespect to the authors. It is a flagrant desecration of the reviewer's sacred duty. It fundamentally undermines the integrity of the entire peer-review process. Given that the reviews are not based on the actual content of our paper, we have decided to withdraw the submission. We leave this comment so that future readers of the OpenReview page are aware that the items described as "missing" are already present in the submitted manuscript. These negative reviews for this submission are factually unsound and do not reflect the content of the paper. We cannot and will not accept an assessment that is not based on the work we actually submitted.

English
0
0
8
652
Peter Richtarik
Peter Richtarik@peter_richtarik·
@_cagarwal I love research as much as ever. But I despise nonsense. And most of the current reviews everyone is getting at these big conferences is nonsense.
English
1
0
1
42