Stephen Bach

1.8K posts

Stephen Bach

@stevebach

Asst. prof. @BrownCSDept. Working on improving how humans teach computers. Weak supervision, zero-shot learning, few-shot learning, and high-level knowledge.

Katılım Ağustos 2007

502 Takip Edilen1.6K Takipçiler

Stephen Bach@stevebach·4d

@yong_zhengxin Thank you Yong! I am lucky to get to work with you

English

Yong Zheng-Xin@yong_zhengxin·6d

congrats @stevebach!! so glad to be advised by you

Brown CS@BrownCSDept

@BrownCSDept is happy to announce that with the anticipated approval of @BrownUniversity's Corporation and effective as of July 1, 2026, @stevebach has been promoted to Associate Professor with tenure. Learn more at Brown CS News: cs.brown.edu/news/2026/05/2…

English

876

Stephen Bach retweetledi

Brown CS@BrownCSDept·13 May

@BrownCSDept Master's student Ilana Nguyen speaks at a United Nations panel focused on ensuring that AI expands opportunity. Learn more at Brown CS Blog: blog.cs.brown.edu/2026/05/13/bro…

English

2.3K

Stephen Bach retweetledi

Cristina Menghini@CriMenghini·15 Nis

Last week we launched Muse Spark at an acceptable risk level under our Advanced AI scaling framework, after multiple mitigation iterations. Today we’re releasing its first Safety & Preparedness Report documenting that decision. This was a long, cross-team effort — from catastrophic risk assessment to day-to-day model behavior. We hope this contributes to transparent discussion of responsible development of personal superintelligence. Running the evals, it was fascinating to watch the model’s safety profile take shape. Under the new framework, we’re also introducing our first assessment of loss of control risks — built on extensive threat modeling that’s still evolving. The report’s dense and there’s a lot of work ahead. You can find the full report here: ai.meta.com/static-resourc…— we’re eager to hear feedback and improve.

Summer Yue@summeryue0

🚀 Muse Spark Safety & Preparedness Report for Meta AI is out. We start with our pre-deployment assessment under Meta's Advanced AI Scaling Framework, covering chemical and biological, cybersecurity, and loss of control risks. Our assessment flagged potentially elevated chem/bio risk, so we implemented safeguards and validated mitigations before deployment - bringing residual risk to within acceptable levels. Beyond the Framework, we also share findings and early explorations of model behavior (honesty, intent understanding, etc.), jailbreak robustness, eval awareness, and more. We're sharing this report to give a closer look at how we evaluate advanced AI safety. Always more work to do, and we welcome feedback from the community. ai.meta.com/static-resourc…

English

5.6K

Stephen Bach retweetledi

ACM Conference on AI and Agentic Systems@CAISconf·31 Mar

The first @TheOfficialACM conference on agentic AI systems just got a boost. @SnorkelAI is joining as a sponsor of @CAISconf this May in San Jose. Stanford AI Lab roots, production AI focus, and a shared belief that this community needs a rigorous home. caisconf.org

ACM Conference on AI and Agentic Systems tweet media

English

1.4K

Stephen Bach retweetledi

Deb Raji@rajiinio·30 Mar

I thought "AI for Science" was something like AlphaFold, ie. using AI to creatively address computational bottlenecks for well articulated scientific problems. Now I'm seeing more of "AI slop cosplaying as research paper", where the problems are fake, methods unverified, etc.

English

653

33.7K

Stephen Bach retweetledi

Tiancheng Hu @ ICLR 2026@tiancheng_hu·24 Şub

1/7 🧵 The GPT-4 technical report featured detailed calibration curves. Since then, not a single major model release has reported calibration. The field quietly stopped measuring whether models know what they don't know. Our new position paper argues this is a mistake. Here's why.

English

1.9K

Stephen Bach retweetledi

Nihal Nayak@nihalcanrun·24 Şub

Targeted instruction tuning for LLMs involves selecting a subset of instructions from a candidate pool using a small query set from target tasks. Despite growing interest, we still lack guidance on what to select. Our new preprint brings clarity to this space (thread 👇).

English

3.4K

Stephen Bach retweetledi

Alex Ratner@ajratner·20 Şub

Simple (proposed!) rule for terminology around synthetic data: If a "synthetic generation" method uses model A to generate data that leads to gains on model B, where A >> B - this is distillation, not synthetic generation :) The true technical challenge of synthetic data is to use model A, plus some cleverness around system architecture and/or human-in-the-loop input (e.g. context eng, review/filtering, editing), to produce data that improves model B where B >= A.

English

4.1K

Stephen Bach retweetledi

Yisong Yue@yisongyue·16 Şub

I am saddened by the loss of Joe Halpern. I still remember taking his Reasoning About Uncertainty class during my first year as a PhD student at @Cornell. Joe leaves behind a tremendous legacy, not only in his research, but the lives of so many students he touched along the way. bangsfuneralhome.com/obituaries/jos…

English

5.9K

Stephen Bach retweetledi

Alex Ratner@ajratner·16 Şub

This week we launched the Open Benchmarks Grant with a $3M initial commitment from @SnorkelAI + partner support from @huggingface @togethercompute @PrimeIntellect @PyTorch @harborframework & others, in order to close the evaluation gap in AI. Our ability to measure AI has been outpaced by our ability to develop it - and open benchmarks are one of several critical, complementary tools to fix this. We're particularly interested in novel benchmarks that push and probe the frontier along three key vectors: (1) Environment complexity --> E.g. complex, domain-specific context and tool/action spaces, human interaction, world modeling) (2) Autonomy horizon --> E.g. long horizon, non-stationary goals (3) Output complexity --> E.g. complex outputs with nuanced, rubric-based evaluation / reward signals Check out more detail + link to apply here! benchmarks.snorkel.ai

English

7.6K

Stephen Bach@stevebach·13 Şub

Awesome that @SnorkelAI is investing in open evaluation for agents! We’ve always said that data is the bottleneck. With increasing model capabilities, it’s often the *evaluation* data that limits progress now. Excited to see what gets built!

vincent sunn chen@vincentsunnchen

x.com/i/article/2021…

English

1.3K

Stephen Bach retweetledi

Omar Khattab@lateinteraction·26 Oca

PSA: If you're not currently following @jacobli99 and staying tuned, you really really should this week.

English

205

39.1K

Stephen Bach retweetledi

Daniel Khashabi 🕊️@DanielKhashabi·2 Oca

Postdoc positions: ai.jhu.edu/careers/postdo… Applications are due January 23, 2026. Positions are for 2 years with the possibility of an extension.

English

2.1K

Stephen Bach retweetledi

Antonia Noori Farzan@antoniafarzan·17 Ara

I may be biased because she's a friend, but this piece by @H_Lev is the best first-person account I've read summing up the mood in Providence right now motherjones.com/politics/2025/…

English

190

23.6K

Stephen Bach retweetledi

Brown Data Science Institute@Brown_DSI·8 Ara

The Data Science Institute is pleased to announce our inaugural 2026 Early Career Breakthrough Research Award recipients! Congratulations to Ying Ma @yingma0107 (@BrownBiostats ), Loukas Gouskos (@brown_physics ), and Kim Fernandes (@BrownAnthro )! dsi.brown.edu/news/2025-12-0…

English

306

Stephen Bach retweetledi

Dylan Sam@dylanjsam·4 Ara

I'm at NeurIPS this week! Excited to meet old/new friends and chat with people about training safer language models. I'm presenting a few works on safety pretraining, measuring diversity in data curation, and monitoring model behaviors --- more info below 👇

English

4.2K

Stephen Bach retweetledi

Dyah Adila 🦄@dyahadila_·1 Ara

⭐ New blog post! Most people think activation steering ≈ a cheap version of finetuning. But why does it sometimes work, and sometimes fall flat? We dug into this and found a surprisingly clear answer. Full breakdown here 👇 sprocketlab.github.io/posts/2025/11/…

English

5.9K

Stephen Bach retweetledi

Yeganeh Kordi@yeganekordi·29 Kas

How well do language models generalize to problems that are harder, or even easier, than the ones they’ve trained on? We show that LLMs don’t generalize across difficulty levels quite as much as you might think. 🧵

English

2.9K

Stephen Bach retweetledi

Tal Linzen@tallinzen·25 Kas

I too am recruiting PhD students this year! things I think about: cognitively plausible LLMs, interpretability, evaluating and improving multi-turn interaction, LLMs for cognitive science and neuroscience, psycholinguistics... the deadline for Data Science is Dec 6 and for Linguistics Dec 18.

English

351

25.6K

Stephen Bach retweetledi

Brown Research@BrownUResearch·24 Kas

ARIA, a Brown-based research consortium supported by a $20 million grant from the National Science Foundation, welcomed scientists from across the U.S. to kick off its five-year program with a launch event in Providence. @BrownUniversity brown.edu/news/2025-11-2…

English

1.6K

Keşfet

@yong_zhengxin @BrownCSDept @TheOfficialACM @SnorkelAI @CAISconf @Cornell @huggingface @togethercompute