Dan Austin

241 posts

Dan Austin

@DAu5tin

I make LLMs work whilst I drink large amounts of tea 🤓🍵 Building at: @Microsoft

London Katılım Eylül 2022

151 Takip Edilen214 Takipçiler

Dan Austin@DAu5tin·3 Kas

@AlexGDimakis Thanks! Yes exactly, the orchestrator is the advisor trained with GRPO. The subagents are also open models (Qwen3-Coder), however they are left alone during training and used purely as tool calls. Cost implications prevented using closed models / GLM-4.6

English

Alex Dimakis@AlexGDimakis·3 Kas

@DAu5tin cool work ! so if I understand, the advisor in your architecture is the orchestrator? Is the orchestrator trained with GRPO while the other agents are closed models?

English

Dan Austin@DAu5tin·3 Kas

I scaled coding-Agent RL to 32x H100s. Achieving 160% improvement on Stanford's TerminalBench, and it was fun! ⚡️🤓

English

150

16.3K

Dan Austin@DAu5tin·3 Kas

Also thank you to @AlexGDimakis who sparked the idea to train the Orchestrator model (in multi-agent setup), instead of a single agent doing everything. When we discussed the now released paper "How to Train Your Advisor". Thanks Alex! arxiv.org/pdf/2510.02453

English

563

Dan Austin@DAu5tin·3 Kas

Thank you to @zjasper and the team at @hyperbolic_labs for providing such a great GPU cloud service, and for always being available for any issues or queries! 10/10!

English

659

Dan Austin@DAu5tin·11 Eyl

github.com/Danau5tin/mult…

ZXX

159

Dan Austin@DAu5tin·11 Eyl

Me on Stanford's TerminalBench leaderboard ahead of Claude Code! 🤓 Forgot to post this image before

English

236

Dan Austin@DAu5tin·2 Eyl

Lots more detail in here: github.com/Danau5tin/mult…

English

152

Dan Austin@DAu5tin·2 Eyl

Here is how it works at a high level! Open source repo below (2/3)

English

175

Dan Austin@DAu5tin·2 Eyl

Weekend experiment accidentally beat Claude Code on Stanford's TerminalBench (#12) - turns out orchestrating multiple specialised AI agents with shared memory works better than I expected 🤓 Open sourced! (1/3)

English

442

Dan Austin@DAu5tin·3 Ağu

All open source with a 208 ⭐️'s github.com/Danau5tin/term…

English

292

Dan Austin@DAu5tin·3 Ağu

My name on Stanford's terminal bench leaderboard🤓 Would love to see how far it would go with some RL compute⚡️

English

1.7K

Dan Austin@DAu5tin·31 Tem

@_manan2005 Hard to say without knowing the details, I guess a good general rule is to make sure the task you are trying to solve for is one which takes a long time, and then within RL, incentivise the agent (via reward) for not trying to finish too early.

English

Manan@_manan2005·30 Tem

@DanAiTuning Can you give some tips on how to make an agent running for hours I am trying to train one but the synthetic. conversation ends up in like 20 to 25 max rounds

English

Dan Austin@DAu5tin·29 Tem

Just tested my long-horizon terminal agent RL training on 32x H100s! 🚀 Too GPU poor to actually train though 😅 Happily though, my untrained agent hit #19 on terminal bench!

English

3.9K

Dan Austin@DAu5tin·30 Tem

@zjasper @hyperbolic_labs Yes thank you for such a great platform!

English

Jasper@zjasper·29 Tem

@DanAiTuning Excited to see that you trained this agent using @hyperbolic_labs!

English

127

Dan Austin@DAu5tin·29 Tem

Everything is open source (agent, data, training code) github.com/Danau5tin/term…

English

348

Dan Austin@DAu5tin·29 Tem

Also, a huge thank you for the amazing work on TerminalBench, which gave me the inspiration to push an AI agent to the limits and be able evaluate it. 🧗🏔️ Thanks to @Mike_A_Merrill, @alexgshaw and all the contributors to this amazing benchmark.

English

224

Dan Austin@DAu5tin·29 Tem

Built on the amazing rLLM package - which is a great dev experience for multi-turn RL. Their framework just works and handled everything I threw at it 🙏 Thanks to @michaelzluo, @sijun_tan, @_royh021, and all of the amazing team at @Agentica_

English

1.8K

Keşfet

@AlexGDimakis @zjasper @hyperbolic_labs @_manan2005 @Mike_A_Merrill @alexgshaw @michaelzluo @sijun_tan