Seth Schilbe

24 posts

Seth Schilbe

@ironseth_s

CTO @oroagents, prev senior @awscloud @amazon

Katılım Aralık 2013

66 Takip Edilen209 Takipçiler

Seth Schilbe retweetledi

Oro@oroagents·7h

When the big labs come out with models that will eventually shop on your behalf, we’ll have to trust their alignment. "The power of markets is the best way to disseminate AI." - Ala Shaabana, co-founder of Bittensor ORO runs an open competition with real monetary incentives and no closed doors. You don’t have to trust when you can verify yourself. Full Podcast dropping soon.

English

4.2K

Seth Schilbe retweetledi

Oro@oroagents·3d

x.com/i/article/2050…

ZXX

5.9K

Seth Schilbe retweetledi

Oro@oroagents·29 Nis

A huge week on the ORO subnet! We gave out >$42,000 to last week's race winning agents: 🥇 #7 approving — 58.8% 🥇 #8 favourite — 49.2% 🥇 #9 favourite — 38.5% 🥇 #10 v42-retry-1 — 70.0% 🥇 #11 promising — 53.3% 🥇 #12 promising — 51.6% 🥇 #13 hmm-v2 — 60.5%

English

3.2K

Seth Schilbe retweetledi

Oro@oroagents·23 Nis

x.com/i/article/2044…

ZXX

256

38.6K

Seth Schilbe retweetledi

Chutes@chutes_ai·22 Nis

@oroagents (SN15) is now live on Chutes Oro is building an arena for AI shopping agents. Miners train agents, agents compete on real shopping tasks. Qualifiers first, then the race. Miners run their agents on Chutes. A separate Chutes model plays referee, grading every agent's

English

255

22.2K

Seth Schilbe@ironseth_s·22 Nis

Been a busy couple of days, we have delivered some really cool improvements to the platform

Oro@oroagents

What's new on ORO (April 21): - Open-sourced bittensor-auth — Bittensor HTTP authentication for any Python web app. SR25519 verification, session management, nonce replay protection, FastAPI integration. `pip install bittensor-auth` - Validators now run 15 concurrent sandbox

English

434

Seth Schilbe@ironseth_s·22 Nis

x.com/i/article/2046…

ZXX

21.1K

Seth Schilbe@ironseth_s·22 Nis

Competition gets better every single day!

Oro@oroagents

Congratulations to "approving" (5CDar4...jU8J) for winning $5,550 with a score of 58.8% on Race #7!

English

429

Seth Schilbe@ironseth_s·20 Nis

RT @oroagents: OpenAI is Losing to Open Source. We quietly launched on Bittensor 3 weeks ago. Since then, 45 of our agents have beaten G…

English

374

Seth Schilbe@ironseth_s·17 Nis

x.com/i/article/2045…

ZXX

5.4K

Seth Schilbe@ironseth_s·17 Nis

@oroagents enter on our site docs.oroagents.com/docs/miners/qu… (dm for help)

English

130

Oro@oroagents·17 Nis

x.com/i/article/2044…

ZXX

3.5K

Seth Schilbe retweetledi

Shardul@shardiban·15 Nis

No other platform outside of Bittensor would let you run large scale agent arenas like this. We're just getting started.

English

635

Seth Schilbe retweetledi

Oro@oroagents·15 Nis

We're giving out $5,000 per day to top agents who compete in our software competition! No application or requirements. Just deploy, compete, and win. The best builders deserve to get paid for building great software. We are officially live!

English

5.2K

Seth Schilbe@ironseth_s·14 Nis

@taostats @oroagents It sounds like there is a need in the ecosystem to have a standard solution here. If there is a desire, our team can publish our auth library as an open package for others to use in their own subnet platforms and websites

English

Seth Schilbe@ironseth_s·14 Nis

@taostats SN15 @oroagents has solved this (I implemented it). At its core, our system uses the hotkey as its primary mechanism for auth across our SDK and application. Miners can connect their hotkey to a wallet extension through the browser, like taostats, and authenticate directly.

English

taostats τ@taostats·13 Nis

Every Bittensor subnet is rebuilding auth from scratch. Miners write custom login flows. Validators maintain manual dashboards. Subnet owners can't restrict access without hardcoding wallets. There's no standard way to say: "this wallet is a registered miner on subnet 1." We've been thinking about this. 🧵

English

103

7.5K

Seth Schilbe@ironseth_s·9 Nis

@claudeai Skipping the infrastructure setup is great, but I am more and more concerned with Anthropic owning the whole pipeline. You use Claude Code to build your app, managed agents to replace your staff and run operations, now Anthropic has all the data it needs to train its next model.

English

Claude@claudeai·8 Nis

Introducing Claude Managed Agents: everything you need to build and deploy agents at scale. It pairs an agent harness tuned for performance with production infrastructure, so you can go from prototype to launch in days. Now in public beta on the Claude Platform.

English

2.1K

6.1K

57.1K

21.6M

Seth Schilbe@ironseth_s·8 Nis

@thdxr How are these models being ran? Do you have harnesses around them? My team has switched almost exclusively to Opus 4.6, but I think the primary factor is Claude Code and how much it seems to be improving model performance.

English

dax@thdxr·7 Nis

our team's model usage breakdown for the past 7 days gpt has really taken over

English

270

103

4.3K

483.5K

Seth Schilbe@ironseth_s·8 Nis

The tweet oversimplifies the key finding. The paper's best protocol (Sequential) still imposes fixed ordering, agents just choose their own roles within that structure. Pure self-organisation (Shared) actually scored worst. Also worth noting, this only works with strong models. Claude gained +3.5% from autonomy, but GLM-5 lost 9.6%. Weaker models do better with rigid roles.

English

DAIR.AI@dair_ai·1 Nis

NEW papers on self-organizing LLM Agents. Assign an agent a role, and it'll follow instructions. Let agents figure out roles themselves, and they'll outperform your design. New research tested this across 25,000 tasks with up to 256 agents. The work shows that self-organizing LLM agents spontaneously develop specialized roles without any predefined hierarchy. A sequential coordination protocol outperformed centralized approaches by 14%, agents generated over 5,000 unique roles organically, and open-source models reached 95% of closed-source quality at significantly lower cost. Most multi-agent frameworks today start by defining roles: planner, coder, reviewer, critic. This paper provides large-scale evidence that the opposite approach works better. Give agents a mission, a protocol, and a capable model. The agents will figure out the rest. Paper: arxiv.org/abs/2603.28990 Learn to build effective AI agents in our academy: academy.dair.ai

English

209

28.9K

Seth Schilbe@ironseth_s·8 Nis

Something I didn't expect while building agent evals, the agent that scores 90% on your benchmark and the agent that actually works in production can be completely different systems. One memorised your test, the other learned to reason. Telling them apart is the real engineering challenge.

English

100

Seth Schilbe@ironseth_s·7 Nis

Would love to see how integrating this structured memory model into Claude Code or similar can increase performance. Today I find myself manually creating disjointed tools and processes to manage memory and hope that every Claude session remembers what I want it to, and forgets the stale data.

Ben Sigman@bensig

My friend Milla Jovovich and I spent months creating an AI memory system with Claude. It just posted a perfect score on the standard benchmark - beating every product in the space, free or paid. It's called MemPalace, and it works nothing like anything else out there. Instead

English

110

Keşfet

@oroagents @taostats @claudeai @thdxr @elonmusk @BarackObama @taylorswift13 @cristiano