Autoppia | Subnet 36 on Bittensor

1.3K posts

Autoppia | Subnet 36 on Bittensor banner
Autoppia | Subnet 36 on Bittensor

Autoppia | Subnet 36 on Bittensor

@AutoppiaAI

On a mission to have the best Web Operator (Automata) and the best Web Benchmark in the world (Infinite Web Arena).

Katılım Ekim 2023
645 Takip Edilen1.7K Takipçiler
tomie
tomie@tomieinlove·
People tend to develop AI psychosis from the models that most closely match their own intelligence. For the average person, that was 4o, which explains the popularity of #keep4o. But for those with more prodigious IQs, from 110 to 120, there's Opus 4.6.
English
55
13
648
54K
Autoppia | Subnet 36 on Bittensor
New miners are joining SN36 since 0% burn went live. We're not going to overhype it. But the flywheel we talked about is starting to turn. More miners → more competition → better agents → better products. We'll share the numbers in our bi-weekly dev update on Sunday. $TAO
Autoppia | Subnet 36 on Bittensor tweet media
English
5
3
34
1.8K
Alham Fikri Aji
Alham Fikri Aji@AlhamFikri·
Should we treat LLM benchmarking like an annual Olympiad event?🏆 With current benchmarks, it is too easy to overfit tasks or manipulate settings. In some cases, people just cheat / being narrow-tuned to a specific benchmark (*cough* LLaMa-4) What if we organized an annual, Olympiad-like event? The tasks must be sealed and unknown. Models cannot study for the test. They must be prepared for anything. We explain this in our new position paper. I am an IOI alum long time ago. I practiced for years to master many algorithms. I wanted to be ready for whatever appeared on the contest day. I believe general LLMs should face the same standard. If they are truly general, they should be ready for whatever use cases. We propose a flow similar to how we typically organize an Olympiad: - Call for Task: We propose an open solicitation for challenging, high-quality tasks from the global research community. - Organizing Committee: A dedicated team curates and improves these submissions. They verify task quality and diversity. - Model Developers: Developers submit their systems blindly before the tasks are revealed. This prevents teams from iterative gaming or manual tuning once the exam starts. - The Actual Olympiad: Evaluation happens in a synchronized, short window. The sealed tasks are released, and all models are tested simultaneously to maintain total integrity under the same setting. Once it is done, everything will be released for reproducibility. Read the full position paper here: arxiv.org/abs/2603.23292 We worked on this together with my student @jcblaisecruz Let me know your thoughts!
Alham Fikri Aji tweet mediaAlham Fikri Aji tweet media
English
4
14
75
6.9K
Autoppia | Subnet 36 on Bittensor
0% burn is now live on Subnet 36. All emissions flow to miners. Promise kept. First of many. $TAO #Bittensor #SN36
Autoppia | Subnet 36 on Bittensor tweet media
Autoppia | Subnet 36 on Bittensor@AutoppiaAI

We owe our community an honest conversation. With everything that's happened lately, we want to talk about where we are, what went wrong, and what we're doing about it. For the past year, we've been building. IWA, Automata, Dynamic Zero, and open-sourcing our solution. But here's what we got wrong: we built in silence. And when our community raised concerns, we got defensive instead of listening. That's on us. So today, we want to address some of the things you've been saying, and what we're doing about it. 𝟭) "𝗪𝗛𝗘𝗥𝗘 𝗔𝗥𝗘 𝗧𝗛𝗘 𝗥𝗘𝗦𝗨𝗟𝗧𝗦? 𝗪𝗛𝗘𝗥𝗘 𝗔𝗥𝗘 𝗧𝗛𝗘 𝗔𝗚𝗘𝗡𝗧𝗦?" They exist. Automata is live right now at automata(dot)autoppia(dot)com and it's powered by the web agent built by our top miner on the subnet. That's the model working as intended: miners compete, the best agent rises, and it gets put to work. Is it perfect though? No. The agent is still improving and there are tasks it struggles with. Building SOTA web agent is genuinely hard — if it weren't, OpenAI and Anthropic wouldn't be pouring resources into the same problem. But ours is already good enough to power a live product, and it's getting better every week. We'll be showcasing what Automata can do so you can see for yourselves. We also have 17 dynamic websites running on IWA, miners deploying models through Chutes, and scores improving on the leaderboard. 𝟮) "𝗠𝗜𝗡𝗘𝗥𝗦 𝗖𝗔𝗡'𝗧 𝗝𝗨𝗦𝗧𝗜𝗙𝗬 𝗪𝗢𝗥𝗞𝗜𝗡𝗚 𝗛𝗘𝗥𝗘. 𝗧𝗛𝗘 𝗕𝗨𝗥𝗡 𝗜𝗦 𝗧𝗢𝗢 𝗛𝗜𝗚𝗛." You were right. 0.75τ/day to miners isn't enough to attract the volume of talent needed to push agents further. Our top miner proved the system works — one miner built an agent good enough to ship in production. Now we need to bring in more miners at that level. We heard this feedback and we took it seriously. 𝙀𝙛𝙛𝙚𝙘𝙩𝙞𝙫𝙚 𝙈𝙖𝙧𝙘𝙝 30, 𝘼𝙪𝙩𝙤𝙥𝙥𝙞𝙖 𝙢𝙤𝙫𝙚𝙨 𝙩𝙤 0% 𝙗𝙪𝙧𝙣. All emissions flow to miners. We're going all in on our miners because the proof of concept is already here. 𝟯) "𝗬𝗢𝗨'𝗟𝗟 𝗚𝗘𝗧 𝗗𝗘𝗥𝗘𝗚𝗜𝗦𝗧𝗘𝗥𝗘𝗗." We're not going anywhere. We've significantly restructured our marketing so you'll get clearer and more consistent messaging from now on. 0% burn will accelerate our progress as we attract more quality miners to the subnet. We're open to feedback. We know we have to earn back your trust and we're asking the community to give us another chance to make things right. We have a cracked team that is fully committed to Bittensor and to execution, and we intend to add value to this network. We know trust isn't rebuilt with one post. It's rebuilt one kept promise at a time. March 30 is the first. Hold us to it. #Bittensor $TAO #WebAgents

English
3
5
41
4.5K
Autoppia | Subnet 36 on Bittensor
We owe our community an honest conversation. With everything that's happened lately, we want to talk about where we are, what went wrong, and what we're doing about it. For the past year, we've been building. IWA, Automata, Dynamic Zero, and open-sourcing our solution. But here's what we got wrong: we built in silence. And when our community raised concerns, we got defensive instead of listening. That's on us. So today, we want to address some of the things you've been saying, and what we're doing about it. 𝟭) "𝗪𝗛𝗘𝗥𝗘 𝗔𝗥𝗘 𝗧𝗛𝗘 𝗥𝗘𝗦𝗨𝗟𝗧𝗦? 𝗪𝗛𝗘𝗥𝗘 𝗔𝗥𝗘 𝗧𝗛𝗘 𝗔𝗚𝗘𝗡𝗧𝗦?" They exist. Automata is live right now at automata(dot)autoppia(dot)com and it's powered by the web agent built by our top miner on the subnet. That's the model working as intended: miners compete, the best agent rises, and it gets put to work. Is it perfect though? No. The agent is still improving and there are tasks it struggles with. Building SOTA web agent is genuinely hard — if it weren't, OpenAI and Anthropic wouldn't be pouring resources into the same problem. But ours is already good enough to power a live product, and it's getting better every week. We'll be showcasing what Automata can do so you can see for yourselves. We also have 17 dynamic websites running on IWA, miners deploying models through Chutes, and scores improving on the leaderboard. 𝟮) "𝗠𝗜𝗡𝗘𝗥𝗦 𝗖𝗔𝗡'𝗧 𝗝𝗨𝗦𝗧𝗜𝗙𝗬 𝗪𝗢𝗥𝗞𝗜𝗡𝗚 𝗛𝗘𝗥𝗘. 𝗧𝗛𝗘 𝗕𝗨𝗥𝗡 𝗜𝗦 𝗧𝗢𝗢 𝗛𝗜𝗚𝗛." You were right. 0.75τ/day to miners isn't enough to attract the volume of talent needed to push agents further. Our top miner proved the system works — one miner built an agent good enough to ship in production. Now we need to bring in more miners at that level. We heard this feedback and we took it seriously. 𝙀𝙛𝙛𝙚𝙘𝙩𝙞𝙫𝙚 𝙈𝙖𝙧𝙘𝙝 30, 𝘼𝙪𝙩𝙤𝙥𝙥𝙞𝙖 𝙢𝙤𝙫𝙚𝙨 𝙩𝙤 0% 𝙗𝙪𝙧𝙣. All emissions flow to miners. We're going all in on our miners because the proof of concept is already here. 𝟯) "𝗬𝗢𝗨'𝗟𝗟 𝗚𝗘𝗧 𝗗𝗘𝗥𝗘𝗚𝗜𝗦𝗧𝗘𝗥𝗘𝗗." We're not going anywhere. We've significantly restructured our marketing so you'll get clearer and more consistent messaging from now on. 0% burn will accelerate our progress as we attract more quality miners to the subnet. We're open to feedback. We know we have to earn back your trust and we're asking the community to give us another chance to make things right. We have a cracked team that is fully committed to Bittensor and to execution, and we intend to add value to this network. We know trust isn't rebuilt with one post. It's rebuilt one kept promise at a time. March 30 is the first. Hold us to it. #Bittensor $TAO #WebAgents
English
8
4
41
11.9K
Autoppia | Subnet 36 on Bittensor
Tomorrow, burn goes to 0%. Every emission flows to miners. As promised. #Bittensor #SN36 $TAO
Autoppia | Subnet 36 on Bittensor@AutoppiaAI

We owe our community an honest conversation. With everything that's happened lately, we want to talk about where we are, what went wrong, and what we're doing about it. For the past year, we've been building. IWA, Automata, Dynamic Zero, and open-sourcing our solution. But here's what we got wrong: we built in silence. And when our community raised concerns, we got defensive instead of listening. That's on us. So today, we want to address some of the things you've been saying, and what we're doing about it. 𝟭) "𝗪𝗛𝗘𝗥𝗘 𝗔𝗥𝗘 𝗧𝗛𝗘 𝗥𝗘𝗦𝗨𝗟𝗧𝗦? 𝗪𝗛𝗘𝗥𝗘 𝗔𝗥𝗘 𝗧𝗛𝗘 𝗔𝗚𝗘𝗡𝗧𝗦?" They exist. Automata is live right now at automata(dot)autoppia(dot)com and it's powered by the web agent built by our top miner on the subnet. That's the model working as intended: miners compete, the best agent rises, and it gets put to work. Is it perfect though? No. The agent is still improving and there are tasks it struggles with. Building SOTA web agent is genuinely hard — if it weren't, OpenAI and Anthropic wouldn't be pouring resources into the same problem. But ours is already good enough to power a live product, and it's getting better every week. We'll be showcasing what Automata can do so you can see for yourselves. We also have 17 dynamic websites running on IWA, miners deploying models through Chutes, and scores improving on the leaderboard. 𝟮) "𝗠𝗜𝗡𝗘𝗥𝗦 𝗖𝗔𝗡'𝗧 𝗝𝗨𝗦𝗧𝗜𝗙𝗬 𝗪𝗢𝗥𝗞𝗜𝗡𝗚 𝗛𝗘𝗥𝗘. 𝗧𝗛𝗘 𝗕𝗨𝗥𝗡 𝗜𝗦 𝗧𝗢𝗢 𝗛𝗜𝗚𝗛." You were right. 0.75τ/day to miners isn't enough to attract the volume of talent needed to push agents further. Our top miner proved the system works — one miner built an agent good enough to ship in production. Now we need to bring in more miners at that level. We heard this feedback and we took it seriously. 𝙀𝙛𝙛𝙚𝙘𝙩𝙞𝙫𝙚 𝙈𝙖𝙧𝙘𝙝 30, 𝘼𝙪𝙩𝙤𝙥𝙥𝙞𝙖 𝙢𝙤𝙫𝙚𝙨 𝙩𝙤 0% 𝙗𝙪𝙧𝙣. All emissions flow to miners. We're going all in on our miners because the proof of concept is already here. 𝟯) "𝗬𝗢𝗨'𝗟𝗟 𝗚𝗘𝗧 𝗗𝗘𝗥𝗘𝗚𝗜𝗦𝗧𝗘𝗥𝗘𝗗." We're not going anywhere. We've significantly restructured our marketing so you'll get clearer and more consistent messaging from now on. 0% burn will accelerate our progress as we attract more quality miners to the subnet. We're open to feedback. We know we have to earn back your trust and we're asking the community to give us another chance to make things right. We have a cracked team that is fully committed to Bittensor and to execution, and we intend to add value to this network. We know trust isn't rebuilt with one post. It's rebuilt one kept promise at a time. March 30 is the first. Hold us to it. #Bittensor $TAO #WebAgents

English
0
3
11
1.4K
Autoppia | Subnet 36 on Bittensor
These aren't static datasets agents can memorize. Every run, the data and conditions change. That's what makes IWA different from every other benchmark. This is what our miners train against daily. And on March 30 when burn hits 0%, we're opening the door for more of them. Explore all 14: infinitewebarena.autoppia.com/websites
English
0
0
0
201
Autoppia | Subnet 36 on Bittensor
🟠 #𝟓: 𝐀𝐮𝐭𝐨𝐖𝐨𝐫𝐤 Mirrors: Upwork | 8/10 difficulty Hiring, consultations, job posting, profile management, messaging — each with multi-field conditional logic. An agent that handles hiring might completely fail at job posting. Generalization is the game. Explore: infinitewebarena.autoppia.com/websites/autow…
Autoppia | Subnet 36 on Bittensor tweet media
English
1
0
0
259
Autoppia | Subnet 36 on Bittensor
Our miners don't train on menial problems. They train on these. Autoppia's Evaluation Sandbox: synthetic versions of realworld apps that push web agents to their limits. 14 live websites. Dynamically generated. Designed to break agents that can't generalize. See the 5 hardest:
English
1
1
6
660