Siddartha Devic

171 posts

Siddartha Devic

@sid_devic

PhD student @CSatUSC, previously intern @apple @amazon, ug @UT_Dallas. Trustworthy AI and ML.

Tham gia Ağustos 2017

600 Đang theo dõi324 Người theo dõi

Tweet ghim

Siddartha Devic@sid_devic·9 Kas

Multicalibration is a fairness notion which requires predictors to be calibrated over subgroups. In work with @dutchhansen (applying to PhDs!), @PreetumNakkiran, and Vatsal Sharan, we empirically ask: when are machine learning models multicalibrated with no additional effort?🧵

English

Siddartha Devic đã retweet

Jesse Zhang@Jesse_Y_Zhang·3 Mar

A reward model that works, zero-shot, across robots, tasks, and scenes? Introducing Robometer: Scaling general-purpose robotic reward models with 1M+ trajectories. Enables zero-shot: online/offline/model-based RL, data retrieval + IL, automatic failure detection, and more! 🧵 (1/12)

English

105

401

87.2K

Siddartha Devic đã retweet

Viggie Smalls@Viggie_Smalls93·21 Şub

Los Angeles the coldest place on planet earth right now

English

295

2.7K

19.2K

2.5M

Siddartha Devic@sid_devic·17 Şub

Looks like a super useful tool for practitioners looking to apply multi calibration techniques!!

Lorenzo Perini@LorenzoPerini95

1/6 🧵 Calibration is hard. Multicalibration—fixing errors across every possible subgroup—is usually impossible at scale. Until now. Introducing MCGrad: A production-ready multicalibration library from Meta, accepted at KDD 2026. 🚀 github.com/facebookincuba…

English

480

Siddartha Devic@sid_devic·3 Ara

@_ghorbani @MishaLaskin @real_ioannis Hi Behrooz, I can't DM you but here's the message I would have sent! Thank you so much for your time.

English

Behrooz Ghorbani@_ghorbani·2 Ara

In Science of Scaling we will focus on three pillars: understanding LLM training dynamics at scale, the role of real and synthetic data, and the science of RL. I am especially excited to pursue this mission together with @MishaLaskin and @real_ioannis at Reflection. I am building a small, high trust team that cares deeply about open research, careful measurement, and engineering excellence. If you are interested in the science of pretraining, data, and RL at scale and want to help push the frontier with a focused, tight knit group, my DMs are open. I will also be at NeurIPS this week (calendly.com/b-ghorbani-bg/…).

English

3.7K

Behrooz Ghorbani@_ghorbani·2 Ara

Hi friends, after three incredible years at OpenAI I am excited to share that I am starting a new chapter at @reflection_ai, where I will be leading the Science of Scaling team. Our mission is to deepen the scientific understanding of large scale learning and to turn compute into intelligence as efficiently and predictably as possible.

English

282

72.8K

Siddartha Devic@sid_devic·3 Ara

@SaleemaAmershi @adamfourney @ASwearngin77874 @bansalg_ @HsseinMzannar @HuaWenyue31539 @w_epperson @ZacharyHuang12 @MayaMurad0 @ecekamar @HosnRafa Hi Saleema, I can't DM you but I would love to chat at neurips! I am a final-year PhD student focused on trustworthy AI / ML, and think that the human-interaction aspect is incredibly important and under-explored. Looking for full-time opportunities in industry. Thank you!

English

121

Saleema Amershi@SaleemaAmershi·2 Ara

Reach out to me or these amazing humans to learn more about our team or #MSR AI Frontiers! @adamfourney @ASwearngin77874 @bansalg_ @HsseinMzannar Tyler Payne @HuaWenyue31539 @w_epperson @ZacharyHuang12 @MayaMurad0 @ecekamar @HosnRafa

English

736

Saleema Amershi@SaleemaAmershi·2 Ara

📢We're hiring! Join an incredible team building AI agents that work *with* people and contribute meaningfully to society. Details below 👇 P.S. I'll be at #NeurIPS2025 and #WiML this week. DM me to chat about agents🤖 or #MSR AI Frontiers!

English

64.3K

Siddartha Devic@sid_devic·29 Kas

@aparandehgheibi Hi Ali, would love to chat sometime at Neurips. Am currently looking for full-time opportunities (graduating in the spring). I couldn't DM you, so commenting here!

English

Ali Parandeh Gheibi@aparandehgheibi·25 Kas

I’m heading to San Diego next week (Wed Dec 3 - Fri Dec 5) for NeurIPS 2025. I’m clearing my schedule to reconnect with old friends and meet new ones obsessed with the "messy" reality of AI engineering. We aren't focused on simple demos; we are focused on climbing the mountain of real-world problems. I'm specifically looking to connect with people tackling the hardest challenges with agentic systems: 1. Coding Agents 2. Multi-agent orchestration 3. Long horizon tasks (maintaining context over hours and days, not minutes) 4. Robust Agentic Evals (moving beyond static benchmarks) 5. Secure agent design We are hiring. If you are tired of toy problems and want to discuss what it takes to build reliable and secure agents, let’s find a time to chat. ☕ Let's grab a coffee: DM me here or comment below. 📄 Can't make it to coffee? If this sounds like the mountain you want to climb, please send your CV directly to me.

English

7.9K

Siddartha Devic đã retweet

Preetum Nakkiran@PreetumNakkiran·11 Kas

LLMs are notorious for "hallucinating": producing confident-sounding answers that are entirely wrong. But with the right definitions, we can extract a semantic notion of "confidence" from LLMs, and this confidence turns out to be calibrated out-of-the-box in many settings (!)

English

585

51.1K

Siddartha Devic@sid_devic·31 Eki

@tiancheng_hu This makes sense, I would agree with that hypothesis! Do you have any intuition for why model merging preserves instruction tuning but improves calibration? (I will take a closer look at your paper after I am done with my ICLR reviews haha!)

English

127

Tiancheng Hu@tiancheng_hu·31 Eki

@sid_devic Thanks! Interesting question - yes next token prediction mostly but the noveltybench results are generation. I think verbalized prediction would improve as well (as long as it's still enough "instruct" to follow verbalized format)

English

Tiancheng Hu@tiancheng_hu·30 Eki

Instruction tuning unlocks incredible skills in LLMs, but at a cost: they become dangerously overconfident. You face a choice: a well-calibrated base model or a capable but unreliable instruct model. What if you didn't have to choose? What if you could navigate the trade-off? (1/8)

GIF

English

1.1K

Siddartha Devic đã retweet

Johnny Tian-Zheng Wei@johntzwei·24 Eki

Announcing 🔭✨Hubble, a suite of open-source LLMs to advance the study of memorization! Pretrained models up to 8B params, with controlled insertion of texts (e.g., book passages, biographies, test sets, and more!) designed to emulate key memorization risks 🧵

English

130

47.8K

Siddartha Devic đã retweet

Aayush Karan@aakaran31·17 Eki

We found a new way to get language models to reason. 🤯 No RL, no training, no verifiers, no prompting. ❌ With better sampling, base models can achieve single-shot reasoning on par with (or better than!) GRPO while avoiding its characteristic loss in generation diversity.

English

249

1.7K

267K

Siddartha Devic@sid_devic·14 Eki

@iclr_conf we only get two weeks to submit five reviews? Surely this is a more accelerated timeline than usual, no?

English

153

Siddartha Devic đã retweet

Yatong Chen@YatongChen·22 Eyl

We (Moritz Hardt, @walesalaudeen96,@joavanschoren) are organizing the Workshop on the Science of Benchmarking & Evaluating AI @EurIPSConf 2025 in Copenhagen! 📢 Call for Posters: rb.gy/kyid4f 📅 Deadline: Oct 10, 2025 (AoE) 🔗 More Info: rebrand.ly/bg931sf

English

5.2K

Siddartha Devic@sid_devic·20 Ağu

@korolova I remember you said you wanted this at some point as well haha

English

Siddartha Devic@sid_devic·20 Ağu

Cursor made me a chrome extension which redirects any html arxiv links that you stumble across on the internet to the pdf version of the arxiv paper instead. Could be useful for some others, but use at your own risk! github.com/sid-devic/arxi…

English

211

Siddartha Devic@sid_devic·24 Tem

Check out our position paper on important directions in LLM uncertainty quantification!

Tejas Srinivasan@_Tejas_S_

🚨 Position paper alert! 🚨 LLM uncertainty quantification (UQ) has been explored with the goal of enabling better reliance on LLMs by humans. However, we argue that common LLM UQ practices are detached from this human-centric aspiration.😭😭 arxiv.org/abs/2506.07461

English

709

Siddartha Devic đã retweet

Tejas Srinivasan@_Tejas_S_·23 Tem

English

7.5K

Siddartha Devic đã retweet

florence ⏹️@morallawwithin·9 Tem

Being a grad student is peak human existence. The end goal of all political and technological progress should be allowing everyone to be a grad student forever

English

540

9.2K

533.9K

Siddartha Devic đã retweet

Kenny Peng@kennylpeng·3 Tem

Are LLMs correlated when they make mistakes? In our new ICML paper, we answer this question using responses of >350 LLMs. We find substantial correlation. On one dataset, LLMs agree on the wrong answer ~2x more than they would at random. 🧵(1/7)

English

207

18.2K

Siddartha Devic đã retweet

Shangshang Wang@UpupWang·12 Haz

Sparse autoencoders (SAEs) can be used to elicit strong reasoning abilities with remarkable efficiency. Using only 1 hour of training at $2 cost without any reasoning traces, we find a way to train 1.5B models via SAEs to score 43.33% Pass@1 on AIME24 and 90% Pass@1 on AMC23.

English

501

72.2K

Khám phá

@_ghorbani @MishaLaskin @real_ioannis @reflection_ai @SaleemaAmershi @adamfourney @ASwearngin77874 @bansalg_