
@AlexGDimakis Thanks! Yes exactly, the orchestrator is the advisor trained with GRPO.
The subagents are also open models (Qwen3-Coder), however they are left alone during training and used purely as tool calls. Cost implications prevented using closed models / GLM-4.6
English








