Stephen Bach

1.8K posts

Stephen Bach

Stephen Bach

@stevebach

Asst. prof. @BrownCSDept. Working on improving how humans teach computers. Weak supervision, zero-shot learning, few-shot learning, and high-level knowledge.

Katılım Ağustos 2007
502 Takip Edilen1.6K Takipçiler
Stephen Bach retweetledi
Tiancheng Hu
Tiancheng Hu@tiancheng_hu·
1/7 🧵 The GPT-4 technical report featured detailed calibration curves. Since then, not a single major model release has reported calibration. The field quietly stopped measuring whether models know what they don't know. Our new position paper argues this is a mistake. Here's why.
Tiancheng Hu tweet media
English
1
6
18
1.8K
Stephen Bach retweetledi
Nihal Nayak
Nihal Nayak@nihalcanrun·
Targeted instruction tuning for LLMs involves selecting a subset of instructions from a candidate pool using a small query set from target tasks. Despite growing interest, we still lack guidance on what to select. Our new preprint brings clarity to this space (thread 👇).
English
2
7
18
3K
Stephen Bach retweetledi
Alex Ratner
Alex Ratner@ajratner·
Simple (proposed!) rule for terminology around synthetic data: If a "synthetic generation" method uses model A to generate data that leads to gains on model B, where A >> B - this is distillation, not synthetic generation :) The true technical challenge of synthetic data is to use model A, plus some cleverness around system architecture and/or human-in-the-loop input (e.g. context eng, review/filtering, editing), to produce data that improves model B where B >= A.
English
3
4
34
4K
Stephen Bach retweetledi
Yisong Yue
Yisong Yue@yisongyue·
I am saddened by the loss of Joe Halpern. I still remember taking his Reasoning About Uncertainty class during my first year as a PhD student at @Cornell. Joe leaves behind a tremendous legacy, not only in his research, but the lives of so many students he touched along the way. bangsfuneralhome.com/obituaries/jos…
English
0
2
47
5.9K
Stephen Bach retweetledi
Alex Ratner
Alex Ratner@ajratner·
This week we launched the Open Benchmarks Grant with a $3M initial commitment from @SnorkelAI + partner support from @huggingface @togethercompute @PrimeIntellect @PyTorch @harborframework & others, in order to close the evaluation gap in AI. Our ability to measure AI has been outpaced by our ability to develop it - and open benchmarks are one of several critical, complementary tools to fix this. We're particularly interested in novel benchmarks that push and probe the frontier along three key vectors: (1) Environment complexity --> E.g. complex, domain-specific context and tool/action spaces, human interaction, world modeling) (2) Autonomy horizon --> E.g. long horizon, non-stationary goals (3) Output complexity --> E.g. complex outputs with nuanced, rubric-based evaluation / reward signals Check out more detail + link to apply here! benchmarks.snorkel.ai
English
1
7
43
7K
Stephen Bach retweetledi
Omar Khattab
Omar Khattab@lateinteraction·
PSA: If you're not currently following @jacobli99 and staying tuned, you really really should this week.
English
14
6
204
39.1K
Stephen Bach retweetledi
Dylan Sam
Dylan Sam@dylanjsam·
I'm at NeurIPS this week! Excited to meet old/new friends and chat with people about training safer language models. I'm presenting a few works on safety pretraining, measuring diversity in data curation, and monitoring model behaviors --- more info below 👇
English
4
4
37
4.2K
Stephen Bach retweetledi
Dyah Adila 🦄
Dyah Adila 🦄@dyahadila_·
⭐ New blog post! Most people think activation steering ≈ a cheap version of finetuning. But why does it sometimes work, and sometimes fall flat? We dug into this and found a surprisingly clear answer. Full breakdown here 👇 sprocketlab.github.io/posts/2025/11/…
Dyah Adila 🦄 tweet media
English
1
16
27
5.8K
Stephen Bach retweetledi
Yeganeh Kordi
Yeganeh Kordi@yeganekordi·
How well do language models generalize to problems that are harder, or even easier, than the ones they’ve trained on? We show that LLMs don’t generalize across difficulty levels quite as much as you might think. 🧵
Yeganeh Kordi tweet media
English
1
8
30
2.8K
Stephen Bach retweetledi
Tal Linzen
Tal Linzen@tallinzen·
I too am recruiting PhD students this year! things I think about: cognitively plausible LLMs, interpretability, evaluating and improving multi-turn interaction, LLMs for cognitive science and neuroscience, psycholinguistics... the deadline for Data Science is Dec 6 and for Linguistics Dec 18.
English
10
66
351
25.5K
Stephen Bach retweetledi
Brown Research
Brown Research@BrownUResearch·
ARIA, a Brown-based research consortium supported by a $20 million grant from the National Science Foundation, welcomed scientists from across the U.S. to kick off its five-year program with a launch event in Providence. @BrownUniversity brown.edu/news/2025-11-2…
English
1
2
8
1.6K
Stephen Bach retweetledi
Stephen Bach retweetledi
Snorkel AI
Snorkel AI@SnorkelAI·
We’re excited to join the 2025 @Deloitte #Fast500! Our CEO @alexratner sums it up well: AI progress starts with better data — not just bigger models. A big milestone for our team and our work advancing expert-verified data, rigorous benchmarks, and trustworthy AI systems. Huge thanks to our customers, partners, and expert community.
Snorkel AI tweet media
English
1
2
13
437
Stephen Bach retweetledi
Pushmeet Kohli
Pushmeet Kohli@pushmeet·
(1) Our team at @GoogleDeepMind has been collaborating with Terence Tao and Javier Gómez-Serrano to use our AI agents (AlphaEvolve, AlphaProof, & Gemini Deep Think) for advancing Maths research. They find that AlphaEvolve can help discover new results across a range of problems.
English
26
178
1.8K
147.2K
Stephen Bach retweetledi
Yong Zheng-Xin
Yong Zheng-Xin@yong_zhengxin·
🚨 Reasoning models can “self-jailbreak”: they recognize a request is harmful, invent a reason why it’s fine, then help with it. We found that after training on benign math/code reasoning, models emergently start to reason themselves out of safety alignment. 🧵👇
Yong Zheng-Xin tweet media
English
1
6
17
5.4K