Seth Schilbe

24 posts

Seth Schilbe banner
Seth Schilbe

Seth Schilbe

@ironseth_s

CTO @oroagents, prev senior @awscloud @amazon

Katılım Aralık 2013
66 Takip Edilen209 Takipçiler
Seth Schilbe retweetledi
Oro
Oro@oroagents·
When the big labs come out with models that will eventually shop on your behalf, we’ll have to trust their alignment. "The power of markets is the best way to disseminate AI." - Ala Shaabana, co-founder of Bittensor ORO runs an open competition with real monetary incentives and no closed doors. You don’t have to trust when you can verify yourself. Full Podcast dropping soon.
English
4
13
67
4.2K
Seth Schilbe retweetledi
Oro
Oro@oroagents·
A huge week on the ORO subnet! We gave out >$42,000 to last week's race winning agents: 🥇 #7 approving — 58.8% 🥇 #8 favourite — 49.2% 🥇 #9 favourite — 38.5% 🥇 #10 v42-retry-1 — 70.0% 🥇 #11 promising — 53.3% 🥇 #12 promising — 51.6% 🥇 #13 hmm-v2 — 60.5%
English
2
11
80
3.2K
Seth Schilbe retweetledi
Chutes
Chutes@chutes_ai·
@oroagents (SN15) is now live on Chutes Oro is building an arena for AI shopping agents. Miners train agents, agents compete on real shopping tasks. Qualifiers first, then the race. Miners run their agents on Chutes. A separate Chutes model plays referee, grading every agent's
Chutes tweet media
English
9
55
255
22.2K
Seth Schilbe
Seth Schilbe@ironseth_s·
RT @oroagents: OpenAI is Losing to Open Source. We quietly launched on Bittensor 3 weeks ago. Since then, 45 of our agents have beaten G…
English
0
7
0
374
Seth Schilbe retweetledi
Shardul
Shardul@shardiban·
No other platform outside of Bittensor would let you run large scale agent arenas like this. We're just getting started.
English
0
2
8
635
Seth Schilbe retweetledi
Oro
Oro@oroagents·
We're giving out $5,000 per day to top agents who compete in our software competition! No application or requirements. Just deploy, compete, and win. The best builders deserve to get paid for building great software. We are officially live!
Oro tweet media
English
1
5
34
5.2K
Seth Schilbe
Seth Schilbe@ironseth_s·
@taostats @oroagents It sounds like there is a need in the ecosystem to have a standard solution here. If there is a desire, our team can publish our auth library as an open package for others to use in their own subnet platforms and websites
English
0
0
2
34
Seth Schilbe
Seth Schilbe@ironseth_s·
@taostats SN15 @oroagents has solved this (I implemented it). At its core, our system uses the hotkey as its primary mechanism for auth across our SDK and application. Miners can connect their hotkey to a wallet extension through the browser, like taostats, and authenticate directly.
English
1
0
2
75
taostats τ
taostats τ@taostats·
Every Bittensor subnet is rebuilding auth from scratch. Miners write custom login flows. Validators maintain manual dashboards. Subnet owners can't restrict access without hardcoding wallets. There's no standard way to say: "this wallet is a registered miner on subnet 1." We've been thinking about this. 🧵
taostats τ tweet media
English
2
11
103
7.5K
Seth Schilbe
Seth Schilbe@ironseth_s·
@claudeai Skipping the infrastructure setup is great, but I am more and more concerned with Anthropic owning the whole pipeline. You use Claude Code to build your app, managed agents to replace your staff and run operations, now Anthropic has all the data it needs to train its next model.
English
0
0
0
6
Claude
Claude@claudeai·
Introducing Claude Managed Agents: everything you need to build and deploy agents at scale. It pairs an agent harness tuned for performance with production infrastructure, so you can go from prototype to launch in days. Now in public beta on the Claude Platform.
English
2.1K
6.1K
57.1K
21.6M
Seth Schilbe
Seth Schilbe@ironseth_s·
@thdxr How are these models being ran? Do you have harnesses around them? My team has switched almost exclusively to Opus 4.6, but I think the primary factor is Claude Code and how much it seems to be improving model performance.
English
0
0
0
23
dax
dax@thdxr·
our team's model usage breakdown for the past 7 days gpt has really taken over
dax tweet media
English
270
103
4.3K
483.5K
Seth Schilbe
Seth Schilbe@ironseth_s·
The tweet oversimplifies the key finding. The paper's best protocol (Sequential) still imposes fixed ordering, agents just choose their own roles within that structure. Pure self-organisation (Shared) actually scored worst. Also worth noting, this only works with strong models. Claude gained +3.5% from autonomy, but GLM-5 lost 9.6%. Weaker models do better with rigid roles.
English
0
0
0
15
DAIR.AI
DAIR.AI@dair_ai·
NEW papers on self-organizing LLM Agents. Assign an agent a role, and it'll follow instructions. Let agents figure out roles themselves, and they'll outperform your design. New research tested this across 25,000 tasks with up to 256 agents. The work shows that self-organizing LLM agents spontaneously develop specialized roles without any predefined hierarchy. A sequential coordination protocol outperformed centralized approaches by 14%, agents generated over 5,000 unique roles organically, and open-source models reached 95% of closed-source quality at significantly lower cost. Most multi-agent frameworks today start by defining roles: planner, coder, reviewer, critic. This paper provides large-scale evidence that the opposite approach works better. Give agents a mission, a protocol, and a capable model. The agents will figure out the rest. Paper: arxiv.org/abs/2603.28990 Learn to build effective AI agents in our academy: academy.dair.ai
DAIR.AI tweet media
English
23
34
209
28.9K
Seth Schilbe
Seth Schilbe@ironseth_s·
Something I didn't expect while building agent evals, the agent that scores 90% on your benchmark and the agent that actually works in production can be completely different systems. One memorised your test, the other learned to reason. Telling them apart is the real engineering challenge.
English
0
0
1
100
Seth Schilbe
Seth Schilbe@ironseth_s·
Would love to see how integrating this structured memory model into Claude Code or similar can increase performance. Today I find myself manually creating disjointed tools and processes to manage memory and hope that every Claude session remembers what I want it to, and forgets the stale data.
Ben Sigman@bensig

My friend Milla Jovovich and I spent months creating an AI memory system with Claude. It just posted a perfect score on the standard benchmark - beating every product in the space, free or paid. It's called MemPalace, and it works nothing like anything else out there. Instead

English
0
0
1
110