Jonathan Berant

1K posts

Jonathan Berant

@JonathanBerant

NLP at Tel-Aviv University and Google

Katılım Haziran 2011

276 Takip Edilen3.2K Takipçiler

Jonathan Berant@JonathanBerant·6 Mar

@benediktstroebl The prompt specifies the budget and we truncate once budget is exhausted (but we rarely need to the model mostly follows the instruction)

English

Benedikt Stroebl@benediktstroebl·6 Mar

@JonathanBerant How do you enforce the token budgets across turns?

English

Jonathan Berant@JonathanBerant·6 Mar

Are AI models effective collaborators, or mere assistants awaiting your next command? (arxiv.org/abs/2602.24188) To find out, we make AI collaborate with itself, in private information games: tasks that require sharing private information, like this chess board ordering task.

English

130

17.8K

Jonathan Berant retweetledi

Max Chen@maximillianc_·6 Mar

📣Excited to finally share our latest work on quantifiably adapting model behavior based on unique preferences 📣 We teach language models to adjust their clarification behavior using scalar coefficients and find they can generalize to unseen coefficients at inference time!

Jonathan Berant@JonathanBerant

Newish work (arXived in Dec.): Prompts can be ambig, but handling ambig. is context/user dependent. Sometimes the right thing is to ask a clarifying question, sometimes to give multi. answers, and sometimes to just guess. Can we train models to change their strategy per context?

English

1.8K

Jonathan Berant@JonathanBerant·6 Mar

More generally, training models to respect scalar values that specify a reward in the prompt is useful! There are more results and analyses in the paper, check it out... arxiv.org/abs/2512.04068 With @maximillianc_ @jacobeisenstein @adamjfisch @fantinehuot @rezaa @mlapata

English

209

Jonathan Berant@JonathanBerant·6 Mar

Models also generalize to coefficients that never occurred at training time!

English

173

Jonathan Berant@JonathanBerant·6 Mar

English

4.5K

Jonathan Berant@JonathanBerant·6 Mar

For more discussion, please see the paper! arxiv.org/abs/2602.24188 @jacobeisenstein bravely led this work (but does not tend to post much research here anymore...) Multi-turn collaboration was definitely a key part of this project! @fantinehuot @adamjfisch and @mlapata

English

688

Jonathan Berant@JonathanBerant·6 Mar

AI systems are also overconfident, terminating dialogues long before exhausting their turn budget - even after explicit reminders.

English

850

Jonathan Berant retweetledi

Ben Bogin@ben_bogin·24 Kas

My team @GoogleAI is looking for a 2026 research intern in Mountain View! I will be hiring for a project aimed at improving tool-using and search agents via RL training and data generation. To apply: google.com/about/careers/… + feel free to ping me!

English

284

20.8K

Jonathan Berant@JonathanBerant·28 Eki

@egrefen @NidarMMV2 @agarwl_ They probably should have presented this as a tutorial on existing work with extensions, very easy to interpret this as presentation of a new method.

English

482

Edward Grefenstette@egrefen·28 Eki

@NidarMMV2 @agarwl_ I my defence... you need to click to expand references on their blog and (on mobile at least) this messes with search.

English

1.6K

Edward Grefenstette@egrefen·28 Eki

This is a great blog post but isn't this just arxiv.org/abs/2306.13649 by @agarwl_? I don't see this work referenced anywhere.

Thinking Machines@thinkymachines

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other approaches for a fraction of the cost. thinkingmachines.ai/blog/on-policy…

English

187

49.9K

Jonathan Berant retweetledi

Samuel AMOUYAL@AmouyalSamuel·16 Eki

I had a lot of fun working on this with @JonathanBerant @aya_meltzer You can find our paper here: arxiv.org/abs/2510.07141 And by the way, the answer (at least based on the sentence) is yes, you can ignore head injuries. But it's a terrible advice

English

303

Jonathan Berant retweetledi

Samuel AMOUYAL@AmouyalSamuel·16 Eki

We have more interesting insights in our paper. We believe this is a really exciting direction for humans and LLMs comparison. Extending our framework to more structures and more LLMs will certainly lead to additional insights !

English

238

Jonathan Berant retweetledi

Samuel AMOUYAL@AmouyalSamuel·16 Eki

We report 3 additional findings: 1. LLMs similarity to humans on GP structures is higher 2. The similarity of the structures' difficulty ordering to humans increases with model size 3. LLM performs better on easy baseline than on the structures if it's not too strong or too weak

English

203

Keşfet

@benediktstroebl @maximillianc_ @jacobeisenstein @adamjfisch @fantinehuot @rezaa @mlapata @GoogleAI