Dan Austin

241 posts

Dan Austin banner
Dan Austin

Dan Austin

@DAu5tin

I make LLMs work whilst I drink large amounts of tea 🤓🍵 Building at: @Microsoft

London Katılım Eylül 2022
151 Takip Edilen214 Takipçiler
Dan Austin
Dan Austin@DAu5tin·
@AlexGDimakis Thanks! Yes exactly, the orchestrator is the advisor trained with GRPO. The subagents are also open models (Qwen3-Coder), however they are left alone during training and used purely as tool calls. Cost implications prevented using closed models / GLM-4.6
English
1
0
1
82
Alex Dimakis
Alex Dimakis@AlexGDimakis·
@DAu5tin cool work ! so if I understand, the advisor in your architecture is the orchestrator? Is the orchestrator trained with GRPO while the other agents are closed models?
English
1
0
0
58
Dan Austin
Dan Austin@DAu5tin·
I scaled coding-Agent RL to 32x H100s. Achieving 160% improvement on Stanford's TerminalBench, and it was fun! ⚡️🤓
Dan Austin tweet media
English
5
9
150
16.3K
Dan Austin
Dan Austin@DAu5tin·
Also thank you to @AlexGDimakis who sparked the idea to train the Orchestrator model (in multi-agent setup), instead of a single agent doing everything. When we discussed the now released paper "How to Train Your Advisor". Thanks Alex! arxiv.org/pdf/2510.02453
English
2
0
8
563
Dan Austin
Dan Austin@DAu5tin·
Thank you to @zjasper and the team at @hyperbolic_labs for providing such a great GPU cloud service, and for always being available for any issues or queries! 10/10!
English
1
0
8
659
Dan Austin
Dan Austin@DAu5tin·
Me on Stanford's TerminalBench leaderboard ahead of Claude Code! 🤓 Forgot to post this image before
Dan Austin tweet media
English
1
0
3
236
Dan Austin
Dan Austin@DAu5tin·
Here is how it works at a high level! Open source repo below (2/3)
Dan Austin tweet media
English
1
0
2
175
Dan Austin
Dan Austin@DAu5tin·
Weekend experiment accidentally beat Claude Code on Stanford's TerminalBench (#12) - turns out orchestrating multiple specialised AI agents with shared memory works better than I expected 🤓 Open sourced! (1/3)
Dan Austin tweet media
English
1
2
6
442
Dan Austin
Dan Austin@DAu5tin·
My name on Stanford's terminal bench leaderboard🤓 Would love to see how far it would go with some RL compute⚡️
Dan Austin tweet media
English
4
2
17
1.7K
Dan Austin
Dan Austin@DAu5tin·
@_manan2005 Hard to say without knowing the details, I guess a good general rule is to make sure the task you are trying to solve for is one which takes a long time, and then within RL, incentivise the agent (via reward) for not trying to finish too early.
English
1
0
0
30
Manan
Manan@_manan2005·
@DanAiTuning Can you give some tips on how to make an agent running for hours I am trying to train one but the synthetic. conversation ends up in like 20 to 25 max rounds
English
1
0
0
44
Dan Austin
Dan Austin@DAu5tin·
Just tested my long-horizon terminal agent RL training on 32x H100s! 🚀 Too GPU poor to actually train though 😅 Happily though, my untrained agent hit #19 on terminal bench!
Dan Austin tweet media
English
6
3
30
3.9K
Dan Austin
Dan Austin@DAu5tin·
Also, a huge thank you for the amazing work on TerminalBench, which gave me the inspiration to push an AI agent to the limits and be able evaluate it. 🧗🏔️ Thanks to @Mike_A_Merrill, @alexgshaw and all the contributors to this amazing benchmark.
English
0
0
3
224
Dan Austin
Dan Austin@DAu5tin·
Built on the amazing rLLM package - which is a great dev experience for multi-turn RL. Their framework just works and handled everything I threw at it 🙏 Thanks to @michaelzluo, @sijun_tan, @_royh021, and all of the amazing team at @Agentica_
English
1
0
7
1.8K