Siddartha Devic

171 posts

Siddartha Devic banner
Siddartha Devic

Siddartha Devic

@sid_devic

PhD student @CSatUSC, previously intern @apple @amazon, ug @UT_Dallas. Trustworthy AI and ML.

Tham gia Ağustos 2017
600 Đang theo dõi324 Người theo dõi
Tweet ghim
Siddartha Devic
Siddartha Devic@sid_devic·
Multicalibration is a fairness notion which requires predictors to be calibrated over subgroups. In work with @dutchhansen (applying to PhDs!), @PreetumNakkiran, and Vatsal Sharan, we empirically ask: when are machine learning models multicalibrated with no additional effort?🧵
Siddartha Devic tweet media
English
1
8
45
8K
Siddartha Devic đã retweet
Jesse Zhang
Jesse Zhang@Jesse_Y_Zhang·
A reward model that works, zero-shot, across robots, tasks, and scenes? Introducing Robometer: Scaling general-purpose robotic reward models with 1M+ trajectories. Enables zero-shot: online/offline/model-based RL, data retrieval + IL, automatic failure detection, and more! 🧵 (1/12)
English
7
105
401
87.2K
Siddartha Devic đã retweet
Viggie Smalls
Viggie Smalls@Viggie_Smalls93·
Los Angeles the coldest place on planet earth right now
English
295
2.7K
19.2K
2.5M
Behrooz Ghorbani
Behrooz Ghorbani@_ghorbani·
In Science of Scaling we will focus on three pillars: understanding LLM training dynamics at scale, the role of real and synthetic data, and the science of RL. I am especially excited to pursue this mission together with @MishaLaskin and @real_ioannis at Reflection. I am building a small, high trust team that cares deeply about open research, careful measurement, and engineering excellence. If you are interested in the science of pretraining, data, and RL at scale and want to help push the frontier with a focused, tight knit group, my DMs are open. I will also be at NeurIPS this week (calendly.com/b-ghorbani-bg/…).
English
3
3
33
3.7K
Behrooz Ghorbani
Behrooz Ghorbani@_ghorbani·
Hi friends, after three incredible years at OpenAI I am excited to share that I am starting a new chapter at @reflection_ai, where I will be leading the Science of Scaling team. Our mission is to deepen the scientific understanding of large scale learning and to turn compute into intelligence as efficiently and predictably as possible.
Behrooz Ghorbani tweet media
English
31
12
282
72.8K
Saleema Amershi
Saleema Amershi@SaleemaAmershi·
📢We're hiring! Join an incredible team building AI agents that work *with* people and contribute meaningfully to society. Details below 👇 P.S. I'll be at #NeurIPS2025 and #WiML this week. DM me to chat about agents🤖 or #MSR AI Frontiers!
English
9
5
67
64.3K
Siddartha Devic
Siddartha Devic@sid_devic·
@aparandehgheibi Hi Ali, would love to chat sometime at Neurips. Am currently looking for full-time opportunities (graduating in the spring). I couldn't DM you, so commenting here!
English
1
0
1
88
Ali Parandeh Gheibi
Ali Parandeh Gheibi@aparandehgheibi·
I’m heading to San Diego next week (Wed Dec 3 - Fri Dec 5) for NeurIPS 2025. I’m clearing my schedule to reconnect with old friends and meet new ones obsessed with the "messy" reality of AI engineering. We aren't focused on simple demos; we are focused on climbing the mountain of real-world problems. I'm specifically looking to connect with people tackling the hardest challenges with agentic systems: 1. Coding Agents 2. Multi-agent orchestration 3. Long horizon tasks (maintaining context over hours and days, not minutes) 4. Robust Agentic Evals (moving beyond static benchmarks) 5. Secure agent design We are hiring. If you are tired of toy problems and want to discuss what it takes to build reliable and secure agents, let’s find a time to chat. ☕ Let's grab a coffee: DM me here or comment below. 📄 Can't make it to coffee? If this sounds like the mountain you want to climb, please send your CV directly to me.
Ali Parandeh Gheibi tweet media
English
2
0
15
7.9K
Siddartha Devic đã retweet
Preetum Nakkiran
Preetum Nakkiran@PreetumNakkiran·
LLMs are notorious for "hallucinating": producing confident-sounding answers that are entirely wrong. But with the right definitions, we can extract a semantic notion of "confidence" from LLMs, and this confidence turns out to be calibrated out-of-the-box in many settings (!)
Preetum Nakkiran tweet media
English
24
81
585
51.1K
Siddartha Devic
Siddartha Devic@sid_devic·
@tiancheng_hu This makes sense, I would agree with that hypothesis! Do you have any intuition for why model merging preserves instruction tuning but improves calibration? (I will take a closer look at your paper after I am done with my ICLR reviews haha!)
English
1
0
0
127
Tiancheng Hu
Tiancheng Hu@tiancheng_hu·
@sid_devic Thanks! Interesting question - yes next token prediction mostly but the noveltybench results are generation. I think verbalized prediction would improve as well (as long as it's still enough "instruct" to follow verbalized format)
English
1
0
2
30
Tiancheng Hu
Tiancheng Hu@tiancheng_hu·
Instruction tuning unlocks incredible skills in LLMs, but at a cost: they become dangerously overconfident. You face a choice: a well-calibrated base model or a capable but unreliable instruct model. What if you didn't have to choose? What if you could navigate the trade-off? (1/8)
GIF
English
3
4
14
1.1K
Siddartha Devic đã retweet
Johnny Tian-Zheng Wei
Johnny Tian-Zheng Wei@johntzwei·
Announcing 🔭✨Hubble, a suite of open-source LLMs to advance the study of memorization! Pretrained models up to 8B params, with controlled insertion of texts (e.g., book passages, biographies, test sets, and more!) designed to emulate key memorization risks 🧵
Johnny Tian-Zheng Wei tweet media
English
2
40
130
47.8K
Siddartha Devic đã retweet
Aayush Karan
Aayush Karan@aakaran31·
We found a new way to get language models to reason. 🤯 No RL, no training, no verifiers, no prompting. ❌ With better sampling, base models can achieve single-shot reasoning on par with (or better than!) GRPO while avoiding its characteristic loss in generation diversity.
English
73
249
1.7K
267K
Siddartha Devic
Siddartha Devic@sid_devic·
@iclr_conf we only get two weeks to submit five reviews? Surely this is a more accelerated timeline than usual, no?
English
0
0
4
153
Siddartha Devic
Siddartha Devic@sid_devic·
@korolova I remember you said you wanted this at some point as well haha
English
0
0
0
81
Siddartha Devic
Siddartha Devic@sid_devic·
Cursor made me a chrome extension which redirects any html arxiv links that you stumble across on the internet to the pdf version of the arxiv paper instead. Could be useful for some others, but use at your own risk! github.com/sid-devic/arxi…
English
1
0
6
211
Siddartha Devic đã retweet
Tejas Srinivasan
Tejas Srinivasan@_Tejas_S_·
🚨 Position paper alert! 🚨 LLM uncertainty quantification (UQ) has been explored with the goal of enabling better reliance on LLMs by humans. However, we argue that common LLM UQ practices are detached from this human-centric aspiration.😭😭 arxiv.org/abs/2506.07461
Tejas Srinivasan tweet media
English
2
10
74
7.5K
Siddartha Devic đã retweet
florence ⏹️
florence ⏹️@morallawwithin·
Being a grad student is peak human existence. The end goal of all political and technological progress should be allowing everyone to be a grad student forever
English
66
540
9.2K
533.9K
Siddartha Devic đã retweet
Kenny Peng
Kenny Peng@kennylpeng·
Are LLMs correlated when they make mistakes? In our new ICML paper, we answer this question using responses of >350 LLMs. We find substantial correlation. On one dataset, LLMs agree on the wrong answer ~2x more than they would at random. 🧵(1/7)
Kenny Peng tweet media
English
8
43
207
18.2K
Siddartha Devic đã retweet
Shangshang Wang
Shangshang Wang@UpupWang·
Sparse autoencoders (SAEs) can be used to elicit strong reasoning abilities with remarkable efficiency. Using only 1 hour of training at $2 cost without any reasoning traces, we find a way to train 1.5B models via SAEs to score 43.33% Pass@1 on AIME24 and 90% Pass@1 on AMC23.
Shangshang Wang tweet media
English
10
55
501
72.2K