Chirag Agarwal

340 posts

Chirag Agarwal

@_cagarwal

Assistant Professor @UVA; PI of Aikyam Lab; Prev - @Harvard, @Adobe @BoschGlobal @thisisUIC ; Increasing the sample size of my thoughts

Katılım Kasım 2013

570 Takip Edilen1.9K Takipçiler

Sabitlenmiş Tweet

Chirag Agarwal@_cagarwal·8 Şub

Excited to share CURE-Med, our new work on making LLMs reliable for medical reasoning across the different languages🌍 In healthcare, models can’t just be great in English and then collapse when deployed in new languages. They need to adapt to new tasks (languages, dialects, medical contexts) without catastrophic forgetting of what you already know. We tackle this head-on with curriculum-informed RL. Datasets and Models on HuggingFace: huggingface.co/Aikyam-Lab Website: cure-med.github.io

English

530

Chirag Agarwal@_cagarwal·8 Şub

Special shoutout to @EricOnyame and Akash Ghosh for leading this work and thanks to the amazing co-authors Subhadip Baidya, Xiuying Chen (@mbzuai), and Sriparna Saha (@iitpatna).

English

Chirag Agarwal@_cagarwal·8 Şub

Our scaling results from 1.5B to 32B shows consistent improvements, where Cure-Med outperforms medical LLMs on OOD benchmarks and human evaluations confirm robustness!

English

113

Chirag Agarwal@_cagarwal·8 Şub

English

530

Chirag Agarwal@_cagarwal·4 Şub

Congratulations on the launch, @sarahookr and @sudip_r0y! Really excited to see what's next.

Sara Hooker@sarahookr

Beginnings are very special. Today is an important day for @adaptionlabs. Today a handful of one-size-fits-all-models are optimized for the average use case. Averages erase the exceptional. Everything intelligent adapts. So should AI.

English

4.5K

Chirag Agarwal retweetledi

Yanran Li@Lyric97243419·22 Oca

Do VLMs actually 'see' or just rely on priors? Fascinating talk by Chirag Agarwal at #AAAI2026 on Trustworthy Multimodal AI. He showed how models fail to count stripes on a shoe simply because they recognize the 'Adidas' logo and hallucinate the standard 3 stripes. 👟 @RealAAAI

English

5.7K

Chirag Agarwal retweetledi

Anh Totti Nguyen@anh_ng8·24 Oca

As LLMs can take higher-stake questions, the verification duty on users grows! Often long-text responses challenge users in parsing & detecting errors🤯 💡Ask LLMs to generate responses in an ▻ interactive ◄ web app that is also LLM-generated to aid user verification! 👩🏻‍💻

GIF

Chirag Agarwal@_cagarwal

Excited to share our new work on verifying LLM reasoning! Everyone loves Chain-of-Thought (CoT). LLMs can generate amazing, step-by-step solutions (called CoT). But when they make a mistake, can a human actually find it quickly? The answer is: No, not easily.

English

588

Chirag Agarwal@_cagarwal·17 Oca

Looking forward to attending #AAAI2026 next week! Honored to be part of the New Faculty Highlights program, where I'll speak on "Advancing Trust in Multimodal AI". We also have an Oral presentation on polarity-aware probing for quantifying latent alignment in LLMs. Excited to share our findings on better understanding and aligning model behavior. Let's chat if you're there! Paper: arxiv.org/abs/2511.21737

English

228

Chirag Agarwal@_cagarwal·8 Ara

It's encouraging to see efforts towards robust, scalable, and actionable interpretability. Great initiative, @withmartian!!

Fazl Barez@FazlBarez

Incredibly excited to announce $1 Million @withmartian prize pool to solve the world’s most important scientific problem in Interpretability. The goal is to turns hard interpretability questions into tools for human empowerment, oversight and governance.

English

371

Chirag Agarwal retweetledi

Karolina Naranjo@karolinaranjo·6 Ara

Join us at the #RegulatableML workshop at #NeurIPS2025 #NeurIPSanDiego to bridge the gap between AI regulation and real-world practice! 🗓 Date: Dec 7, 2025 ⏰ Time: 8:45 am - 5:30 pm (PST) 📍 Location: Upper Level Room 1AB, San Diego, CA More info: regulatableml.github.io

English

2.6K

Chirag Agarwal retweetledi

Guide Labs@guidelabsai·3 Ara

Excited to announce our $9M USD seed round led by @Initialized and the first large-scale interpretable LLM: an 8 billion parameter model capable of explaining its outputs through mechanisms humans can actually understand.

English

9.8K

Chirag Agarwal@_cagarwal·3 Ara

“I have a large, aligned, and safe model” — No One. Had a great time speaking about Robust Unsupervised Probing Frameworks to evaluate the alignment of language models in the context of AI Safety. Paper: arxiv.org/pdf/2511.21737

FAR.AI@farairesearch

Day 1 of the San Diego Alignment Workshop on frontier alignment, evals, control and mech interp. @Yoshua_Bengio @ARGleave @sleepinyourhat @majatrebacz @AnkaReuel @daniel_d_kang @adamfungi @_cagarwal @natashajaques and more. Day 2 tomorrow! 👇 Recordings coming soon. Follow us to stay updated! In the meantime, check out past talks on YouTube: buff.ly/KxksFNJ

English

1.4K

Chirag Agarwal retweetledi

FAR.AI@farairesearch·2 Ara

English

139

82K

Chirag Agarwal@_cagarwal·27 Kas

Heading to NeurIPS next week!! Looking forward to present our latest work on understanding latent alignment in multimodal LLMs, and make new collaborations. Please join us at the Regulatable ML Workshop on December 7th. This is a crucial area, and we are excited to bring the community together to discuss the path forward for compliant ML deployment. Also, @uvadatascience is hiring for PhD students and Faculty Positions. If you are on the job market or know someone who is, please feel free to reach out. Happy to share details on hiring, admissions, and the exciting research we are doing. Link: regulatableml.github.io

English

356

Chirag Agarwal retweetledi

Gautam Kamath@thegautamkamath·13 Kas

I am recruiting PhD students at @NYU_Courant to conduct research in learning theory, algorithmic statistics, and trustworthy machine learning, starting Fall 2026. Please share widely! Deadline to apply is December 12, 2025.

English

131

582

80.3K

Chirag Agarwal@_cagarwal·13 Kas

My first AC experience at ICLR has been eye-opening. In the relentless race to publish more papers, I realize we (authors and reviewers) are losing our love for research and forgetting that everyone is both an author and a reviewer.

Peter Richtarik@peter_richtarik

I am an AC for ICLR 2026. One of the papers in my batch was just withdrawn. The authors wrote a brief response, explaining why the reviewers failed at their job. I agree with most of their comments. The authors gave up. They are fed up. Just like many of us. I understand. We pretend the emperor has clothes, but he is naked. Here is the final part of their withdrawal notice. I took the liberty to make it public, to highlight that what we are doing with AI conference reviews these last few years is, basically, madness. --- Comment: We thank the reviewers for their time. However, upon reading the reviews for our paper, it became immediately apparent that the four "reject" ratings are not based on good-faith academic disagreement, but on a critical failure to read the submitted paper. The reviews are rife with demonstrably false claims that are directly contradicted by the text. The core justifications for rejection rely on asserting that key components are "missing" when they are explicitly detailed in the manuscript. Some specific examples are (and many are even fake claims). Claim: Harder tasks like GSM8K are missing. Fact: GSM8K results are in many tables, like Table 2 (Section 4.2) and Appendix G. Claim: The method does not use per-layer ranks. Fact: This is the entire point of our method. The reviewer clearly mistook our method for the baselines. (Section 2, Table 1). Claim: The GP kernel is not specified. Fact: It is specified in Appendix E (Table 6). Claim: There is no ablation of the method's three stages. Fact: Section 4.4 ("Ablation Study") and Appendix J are dedicated to this. Reviewers have a fundamental responsibility to read and evaluate the work they are assigned. The nature of these errors is so fundamental, so systemic in overlooking explicit content, that it goes far beyond what "limited time" or "oversight" can explain. This work has gone through several rounds of revision over the last year. In earlier submissions, the paper usually received borderline or weak-accept scores. Numerous signs strongly suggest that some reviewers are relying entirely on AI tools to automatically generate peer reviews, rather than fulfilling their fundamental responsibility of personally reading and evaluating manuscripts. We strongly protest this. This is a gross disrespect to the authors. It is a flagrant desecration of the reviewer's sacred duty. It fundamentally undermines the integrity of the entire peer-review process. Given that the reviews are not based on the actual content of our paper, we have decided to withdraw the submission. We leave this comment so that future readers of the OpenReview page are aware that the items described as "missing" are already present in the submitted manuscript. These negative reviews for this submission are factually unsound and do not reflect the content of the paper. We cannot and will not accept an assessment that is not based on the work we actually submitted.

English

652

Chirag Agarwal@_cagarwal·13 Kas

@peter_richtarik Couldn't agree more.

English

Peter Richtarik@peter_richtarik·13 Kas

@_cagarwal I love research as much as ever. But I despise nonsense. And most of the current reviews everyone is getting at these big conferences is nonsense.

English

Keşfet

@EricOnyame @mbzuai @iitpatna @sarahookr @sudip_r0y @RealAAAI @withmartian @Initialized