Duckworth

158 posts

Duckworth banner
Duckworth

Duckworth

@Duckworth65

here for the tek || Big Fan of Superpowers Ai

Katılım Ocak 2025
113 Takip Edilen122 Takipçiler
Sabitlenmiş Tweet
Duckworth
Duckworth@Duckworth65·
Rockefeller was right, too many people overestimate what they lack and underestimate what they have. You can make big money even in your current situation. You don’t need as much as you think.
English
6
1
12
829
Rohan Arun
Rohan Arun@RohanArun·
🚨Striking new benchmarks for long-running computer-use beating @OpenAI with their own models! 3 random people who follow me, comment, and quote repost the tweet below wins $100 in 48 hours! X will be removing communities soon so follow me anyway to stay tuned for more launches and promos.
Rohan Arun@RohanArun

🚨Striking new benchmarks for long-running computer-use beating @OpenAI with their own models! ❌ Codex Computer Use: 21 minutes and fails Computer-use-kit + our native Mac app: ✅ @Alibaba_Qwen Qwen 3.6 Plus: 5m 27s success $0.325/$1.95👀 ✅ @deepseek_ai V4: 3m 34s success. $1.74/$3.48 👀 ✅ GPT 5.4: 2m 41s success $2.50/$15 ✅ GPT 5.5 Pro: 4m 34s success $30/$180 Task: clip the latest video from our Youtube channel and post it to Tiktok. We just published a new realtime upgrade to optimize our computer-use-kit runtimes(API launching soon). We finally solved reliable computer-use for long-running tasks, and we use benchmarks to rigorously test and report which use-cases will work best on which models. The cool thing is our benchmarks clearly show an upper limit to tasks, so you can use open models to run them!

English
13
11
31
1.2K
Tibo
Tibo@thsottiaux·
Hello builders. What are we getting wrong with Codex, what can we improve?
English
2.5K
64
2.9K
323.5K
Duckworth retweetledi
Rohan Arun
Rohan Arun@RohanArun·
For VERY INTERESTING reasons, @nvidia Nemotron Super 3 outperforms GPT 5.4 in long-running computer-use.. This trick applies more open models, and it's a consequence of how the models fundamentally work so this will keep happening.
Rohan Arun tweet mediaRohan Arun tweet media
English
11
6
31
816
Duckworth retweetledi
FairGambling
FairGambling@fairgambling·
Never gamble without us again. Launching tomorrow.
English
5.4K
4.3K
5K
389.4K
Rohan Arun
Rohan Arun@RohanArun·
Back in SF! The game is afoot!
English
13
3
29
480
Rohan Arun
Rohan Arun@RohanArun·
@gdb I can make it faster and more robust. Put me in the game coach 🫡 Check out our side-by-side benchmark below beating Codex with it's own model. x.com/Viewforge/stat…
Rohan Arun@RohanArun

Side-by-side benchmarks beating @OpenAI Codex computer-use using their own models! 👀 Round 1: Clip a youtube video from our channel and upload it to Tiktok ✅ GetSupers.com + GPT 5.4 + our computer-use-kit: Successfully uploads a clip with subtitles and hook after 16 minutes (and works in iPhone/Android) ❌ Codex + GPT 5.4: Gets the clip format wrong 3 times, asks for human intervention, and finally fails after 21 minutes. Codex actually does try iPhone mirroring and Capcut, which is very cool and kudos to the team, but it ultimately fails after burning credits. @sama this is not easy to do, but happy to help you guys integrate our computer-use-kit. 😀 I co-founded the first startup approved by openAI to sell GPT3 for automation in August 2021(Cheatlayer) months before adept.ai so I've been working on this for a long time. We automate Mac/Windows/Linux/Chrome/Android/Iphone + @daytonaio sandboxes and @browserbase cloud browsers out of the box. We also just shipped automated benchmarks, so we're building the most comprehensive computer-use benchmark for long-running tasks on the planet.

English
2
0
12
969
Rohan Arun
Rohan Arun@RohanArun·
@evaedxn Just sold my last AI marketing startup, and now we're competing directly with OpenAI to unlock AI for consumers through reliable computer-use. This automates the apps they use daily so it unlocks AI for most non-technical people(most of the world). x.com/Viewforge/stat…
Rohan Arun@RohanArun

Side-by-side benchmarks beating @OpenAI Codex computer-use using their own models! 👀 Round 1: Clip a youtube video from our channel and upload it to Tiktok ✅ GetSupers.com + GPT 5.4 + our computer-use-kit: Successfully uploads a clip with subtitles and hook after 16 minutes (and works in iPhone/Android) ❌ Codex + GPT 5.4: Gets the clip format wrong 3 times, asks for human intervention, and finally fails after 21 minutes. Codex actually does try iPhone mirroring and Capcut, which is very cool and kudos to the team, but it ultimately fails after burning credits. @sama this is not easy to do, but happy to help you guys integrate our computer-use-kit. 😀 I co-founded the first startup approved by openAI to sell GPT3 for automation in August 2021(Cheatlayer) months before adept.ai so I've been working on this for a long time. We automate Mac/Windows/Linux/Chrome/Android/Iphone + @daytonaio sandboxes and @browserbase cloud browsers out of the box. We also just shipped automated benchmarks, so we're building the most comprehensive computer-use benchmark for long-running tasks on the planet.

English
6
2
18
761
eva edxn
eva edxn@evaedxn·
has anyone changed their lives with an AI agent yet?
English
133
16
227
8.6K
Duckworth
Duckworth@Duckworth65·
@rohanarun Woahhhhh 🥹🔥 Finally I won after so many attempts, @rohanarun I can’t dm cause you don’t follow me boss
English
1
0
4
77
Rohan Arun
Rohan Arun@RohanArun·
@thsottiaux I can make it faster and more robust. Put me in the game coach 🫡 Check out our side-by-side benchmark below beating Codex with it's own model. x.com/Viewforge/stat…
Rohan Arun@RohanArun

Side-by-side benchmarks beating @OpenAI Codex computer-use using their own models! 👀 Round 1: Clip a youtube video from our channel and upload it to Tiktok ✅ GetSupers.com + GPT 5.4 + our computer-use-kit: Successfully uploads a clip with subtitles and hook after 16 minutes (and works in iPhone/Android) ❌ Codex + GPT 5.4: Gets the clip format wrong 3 times, asks for human intervention, and finally fails after 21 minutes. Codex actually does try iPhone mirroring and Capcut, which is very cool and kudos to the team, but it ultimately fails after burning credits. @sama this is not easy to do, but happy to help you guys integrate our computer-use-kit. 😀 I co-founded the first startup approved by openAI to sell GPT3 for automation in August 2021(Cheatlayer) months before adept.ai so I've been working on this for a long time. We automate Mac/Windows/Linux/Chrome/Android/Iphone + @daytonaio sandboxes and @browserbase cloud browsers out of the box. We also just shipped automated benchmarks, so we're building the most comprehensive computer-use benchmark for long-running tasks on the planet.

English
8
0
18
1.2K
Tibo
Tibo@thsottiaux·
If you think Codex with GPT-5.4 is /fast already… we have line of sight for at least an order of magnitude in speedups this year. Good engineering compounds and it’s never been a better time to learn Codex.
English
357
188
5.7K
289.2K
Rohan Arun
Rohan Arun@RohanArun·
Side-by-side benchmarks beating @OpenAI Codex computer-use using their own models! 👀 Round 1: Clip a youtube video from our channel and upload it to Tiktok ✅ GetSupers.com + GPT 5.4 + our computer-use-kit: Successfully uploads a clip with subtitles and hook after 16 minutes (and works in iPhone/Android) ❌ Codex + GPT 5.4: Gets the clip format wrong 3 times, asks for human intervention, and finally fails after 21 minutes. Codex actually does try iPhone mirroring and Capcut, which is very cool and kudos to the team, but it ultimately fails after burning credits. @sama this is not easy to do, but happy to help you guys integrate our computer-use-kit. 😀 I co-founded the first startup approved by openAI to sell GPT3 for automation in August 2021(Cheatlayer) months before adept.ai so I've been working on this for a long time. We automate Mac/Windows/Linux/Chrome/Android/Iphone + @daytonaio sandboxes and @browserbase cloud browsers out of the box. We also just shipped automated benchmarks, so we're building the most comprehensive computer-use benchmark for long-running tasks on the planet.
English
47
35
100
108.2K
Duckworth
Duckworth@Duckworth65·
Woah! @rohanarun just dropped automated benchmarks for long-running computer-use AI agents — and the open-source Hermes 3.1 405B is outperforming GPT-5.4 on real GUI tasks like posting to Twitter and TikTok. This is next-level agent testing. 🔥
Rohan Arun@RohanArun

Introducing automated benchmarks for long-running computer-use agents, automatically generating the most comprehensive computer-use benchmark on the planet. Using our computer-use kit, @NousResearch Hermes 405B instruct free is performing better than @OpenAI GPT 5.4.... 👀

English
10
1
62
269.8K
Duckworth
Duckworth@Duckworth65·
@rohanarun just dropped automated benchmarks for long-running computer-use AI agents — and the open-source Hermes 3.1 405B is outperforming GPT-5.4 on real GUI tasks like posting to Twitter and TikTok. This is next-level agent testing. 🔥
Rohan Arun@RohanArun

Introducing automated benchmarks for long-running computer-use agents, automatically generating the most comprehensive computer-use benchmark on the planet. Using our computer-use kit, @NousResearch Hermes 405B instruct free is performing better than @OpenAI GPT 5.4.... 👀

English
0
0
0
6