
@benediktstroebl The prompt specifies the budget and we truncate once budget is exhausted (but we rarely need to the model mostly follows the instruction)
English
Jonathan Berant
1K posts

@JonathanBerant
NLP at Tel-Aviv University and Google





Newish work (arXived in Dec.): Prompts can be ambig, but handling ambig. is context/user dependent. Sometimes the right thing is to ask a clarifying question, sometimes to give multi. answers, and sometimes to just guess. Can we train models to change their strategy per context?






Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other approaches for a fraction of the cost. thinkingmachines.ai/blog/on-policy…
