Rohan Arun (@RohanArun) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Side-by-side benchmarks beating @OpenAI Codex computer-use using their own models! 👀 Round 1: Clip a youtube video from our channel and upload it to Tiktok ✅ GetSupers.com + GPT 5.4 + our computer-use-kit: Successfully uploads a clip with subtitles and hook after 16 minutes (and works in iPhone/Android) ❌ Codex + GPT 5.4: Gets the clip format wrong 3 times, asks for human intervention, and finally fails after 21 minutes. Codex actually does try iPhone mirroring and Capcut, which is very cool and kudos to the team, but it ultimately fails after burning credits. @sama this is not easy to do, but happy to help you guys integrate our computer-use-kit. 😀 I co-founded the first startup approved by openAI to sell GPT3 for automation in August 2021(Cheatlayer) months before adept.ai so I've been working on this for a long time. We automate Mac/Windows/Linux/Chrome/Android/Iphone + @daytonaio sandboxes and @browserbase cloud browsers out of the box. We also just shipped automated benchmarks, so we're building the most comprehensive computer-use benchmark for long-running tasks on the planet.

English

47

35

100

108.1K

Rohan Arun@RohanArun·3h

Round 2: Download this youtube video(audio on 😂) Codex + 5.4 thinking high: fail ❌ Computer-use-kit: GPT 5.4: 23.4 seconds ✅ Opus 4.7: 14.6 seconds ✅ Qwen 3.6 +: 22.9 seconds ✅ Deep Seek v4 Pro: 15.4 seconds ✅

Rohan Arun@RohanArun

Side-by-side benchmarks beating @OpenAI Codex computer-use using their own models! 👀 Round 1: Clip a youtube video from our channel and upload it to Tiktok ✅ GetSupers.com + GPT 5.4 + our computer-use-kit: Successfully uploads a clip with subtitles and hook after 16 minutes (and works in iPhone/Android) ❌ Codex + GPT 5.4: Gets the clip format wrong 3 times, asks for human intervention, and finally fails after 21 minutes. Codex actually does try iPhone mirroring and Capcut, which is very cool and kudos to the team, but it ultimately fails after burning credits. @sama this is not easy to do, but happy to help you guys integrate our computer-use-kit. 😀 I co-founded the first startup approved by openAI to sell GPT3 for automation in August 2021(Cheatlayer) months before adept.ai so I've been working on this for a long time. We automate Mac/Windows/Linux/Chrome/Android/Iphone + @daytonaio sandboxes and @browserbase cloud browsers out of the box. We also just shipped automated benchmarks, so we're building the most comprehensive computer-use benchmark for long-running tasks on the planet.

English

7

3

15

298

Rohan Arun@RohanArun·8h

So way back on December 26th we discovered @GoogleAI VEO 3.1 is actually a physically accurate world model that reconstructs accurate paths around aruco markers. Guess what new discovery we just made 👀

English

2

1

13

209

Rohan Arun@RohanArun·8h

I can respect that and by blazing your own trail you can bend reality to your will. However iterating to PMF it's objectively important to focus on the urgent signals to find some underlying truth of the market faster. All these people dunking on you are not the urgent signals I would waste energy on but you do have some actual breakthrough there in context engineering/expert databases.

English

0

2

36

Garry Tan@garrytan·9h

@RohanArun @ParagArora @im_roy_lee Not my style it’s ok if people don’t like me I’m not here to be liked. I’m here on a mission to build.

English

1

0

1

103

Rohan Arun@RohanArun·9h

@GMMeyer @___4o____ Agreed but even so, it's disingenuous not to explain that because the general public has no idea and sees negative number as bad. My friends have raised at unicorn scale and walked away with nothing due to liquidity preferences so you and I understand this obviously.

English

1

0

4

100

Greg Meyer@GMMeyer·9h

@RohanArun @___4o____ at cursor scale it’s a little different—you have to have a really good reason to still have those margins at that size, hence why they weren’t able to raise again and had to do an acquisition warrant to raise

English

1

0

1

123

SPEC@___4o____·10h

YC F25

6

0

208

18.4K

Rohan Arun@RohanArun·9h

@sama Hi send xwing

Nederlands

0

1

10

106

Sam Altman@sama·1d

this was a good week. proud of the team. happy building!

English

594

203

8.3K

274.8K

Rohan Arun@RohanArun·10h

@thsottiaux You can't though 💀

English

4

0

9

114

Tibo@thsottiaux·12h

You can just codex things

English

205

50

1.5K

53.3K

Rohan Arun@RohanArun·10h

@thsottiaux You can't though 💀

English

1

0

5

116

Rohan Arun@RohanArun·10h

@garrytan @ParagArora Yeah but personally responding and punching down is not how to win social @garrytan it will have the opposite effect. Own it like @im_roy_lee and let the criticism flow past you like a chill dude. People will trust your authority more

English

2

1

7

81

Garry Tan@garrytan·10h

@ParagArora This is totally wrong Find me another place that can boil down accounting for startups into one screenshot, I'm waiting x.com/garrytan/statu…

Garry Tan@garrytan

A bunch of people are starting to dunk on this as if us releasing this were bad. If you're 18 and a brilliant engineer, you aren't born with this. This is 101, basic knowledge. And is it complicated? Yes Accounting you could spend a lifetime learning! Hard to distill to 1 page.

English

1

0

207

Rohan Arun@RohanArun·10h

@alex_prompter Codex needed 4x the time and lost 💀x.com/RohanArun/stat…

Rohan Arun@RohanArun

Side-by-side benchmarks beating @OpenAI Codex computer-use using their own models! 👀 Round 1: Clip a youtube video from our channel and upload it to Tiktok ✅ GetSupers.com + GPT 5.4 + our computer-use-kit: Successfully uploads a clip with subtitles and hook after 16 minutes (and works in iPhone/Android) ❌ Codex + GPT 5.4: Gets the clip format wrong 3 times, asks for human intervention, and finally fails after 21 minutes. Codex actually does try iPhone mirroring and Capcut, which is very cool and kudos to the team, but it ultimately fails after burning credits. @sama this is not easy to do, but happy to help you guys integrate our computer-use-kit. 😀 I co-founded the first startup approved by openAI to sell GPT3 for automation in August 2021(Cheatlayer) months before adept.ai so I've been working on this for a long time. We automate Mac/Windows/Linux/Chrome/Android/Iphone + @daytonaio sandboxes and @browserbase cloud browsers out of the box. We also just shipped automated benchmarks, so we're building the most comprehensive computer-use benchmark for long-running tasks on the planet.

English

5

3

11

126

Alex Prompter@alex_prompter·1d

Deepseek needed 2x the tokens and still lost. But yeah… “4.3x cheaper” 😂 You’re not saving money. You’re scaling bad output faster.

atomic.chat@atomic_chat_hq

Deepseek V4 Pro vs GPT-5.5 in a gamedev contest (full prompt is below)🏎️ Cost: Deepseek V4 Pro: $0.07656 GPT-5.5: $0.33063 Output stats: Deepseek: 34 tok/s · 9m 5s · 18,869 tokens GPT-5.5: 25 tok/s · 7m 5s · 10,580 tokens Conclusion: GPT-5.5 clearly made the better karting game. Deepseek V4 Pro was 4.3x cheaper and generated almost 2x more tokens, but the final result was weaker. It struggled with graphics, visual polish, and creative direction, while GPT-5.5 delivered better game quality, better visuals, more creativity, and stronger overall execution. Even though Deepseek positions itself as a strong model for coding, in this gamedev test it still felt far behind GPT-5.5. Try the same karting prompt with another AI model and share your result below.

English

17

3

87

19.8K

Rohan Arun@RohanArun·1d

@AlexanderTw33ts I'll pay $.10 on the dollar it might run 5 benchmarks for us 😂

English

1

0

7

455

Alex@AlexanderTw33ts·1d

guys what should I do with this?

English

75

0

108

13.3K

Rohan Arun@RohanArun·1d

🚨Striking new benchmarks for long-running computer-use beating @OpenAI with their own models! 3 random people who follow me, comment, and quote repost the tweet below wins $100 in 48 hours! X will be removing communities soon so follow me anyway to stay tuned for more launches and promos.

Rohan Arun@RohanArun

🚨Striking new benchmarks for long-running computer-use beating @OpenAI with their own models! ❌ Codex Computer Use: 21 minutes and fails Computer-use-kit + our native Mac app: ✅ @Alibaba_Qwen Qwen 3.6 Plus: 5m 27s success $0.325/$1.95👀 ✅ @deepseek_ai V4: 3m 34s success. $1.74/$3.48 👀 ✅ GPT 5.4: 2m 41s success $2.50/$15 ✅ GPT 5.5 Pro: 4m 34s success $30/$180 Task: clip the latest video from our Youtube channel and post it to Tiktok. We just published a new realtime upgrade to optimize our computer-use-kit runtimes(API launching soon). We finally solved reliable computer-use for long-running tasks, and we use benchmarks to rigorously test and report which use-cases will work best on which models. The cool thing is our benchmarks clearly show an upper limit to tasks, so you can use open models to run them!

English

13

11

31

1.2K

Rohan Arun@RohanArun·1d

🚨Striking new benchmarks for long-running computer-use beating @OpenAI with their own models! ❌ Codex Computer Use: 21 minutes and fails Computer-use-kit + our native Mac app: ✅ @Alibaba_Qwen Qwen 3.6 Plus: 5m 27s success $0.325/$1.95👀 ✅ @deepseek_ai V4: 3m 34s success. $1.74/$3.48 👀 ✅ GPT 5.4: 2m 41s success $2.50/$15 ✅ GPT 5.5 Pro: 4m 34s success $30/$180 Task: clip the latest video from our Youtube channel and post it to Tiktok. We just published a new realtime upgrade to optimize our computer-use-kit runtimes(API launching soon). We finally solved reliable computer-use for long-running tasks, and we use benchmarks to rigorously test and report which use-cases will work best on which models. The cool thing is our benchmarks clearly show an upper limit to tasks, so you can use open models to run them!

Rohan Arun@RohanArun

Side-by-side benchmarks beating @OpenAI Codex computer-use using their own models! 👀 Round 1: Clip a youtube video from our channel and upload it to Tiktok ✅ GetSupers.com + GPT 5.4 + our computer-use-kit: Successfully uploads a clip with subtitles and hook after 16 minutes (and works in iPhone/Android) ❌ Codex + GPT 5.4: Gets the clip format wrong 3 times, asks for human intervention, and finally fails after 21 minutes. Codex actually does try iPhone mirroring and Capcut, which is very cool and kudos to the team, but it ultimately fails after burning credits. @sama this is not easy to do, but happy to help you guys integrate our computer-use-kit. 😀 I co-founded the first startup approved by openAI to sell GPT3 for automation in August 2021(Cheatlayer) months before adept.ai so I've been working on this for a long time. We automate Mac/Windows/Linux/Chrome/Android/Iphone + @daytonaio sandboxes and @browserbase cloud browsers out of the box. We also just shipped automated benchmarks, so we're building the most comprehensive computer-use benchmark for long-running tasks on the planet.

English

18

15

36

2.9K

Rohan Arun@RohanArun·1d

Just 5 days ago this video went viral for beating @OpenAI computer-use using their own models in 16 minutes on long-running tasks.. @deepseek_ai v4 runs the same benchmark now with our latest realtime computer-use breakthrough in ~3 minutes 34 seconds! 🤯 More striking benchmarks coming soon..

Rohan Arun@RohanArun

Side-by-side benchmarks beating @OpenAI Codex computer-use using their own models! 👀 Round 1: Clip a youtube video from our channel and upload it to Tiktok ✅ GetSupers.com + GPT 5.4 + our computer-use-kit: Successfully uploads a clip with subtitles and hook after 16 minutes (and works in iPhone/Android) ❌ Codex + GPT 5.4: Gets the clip format wrong 3 times, asks for human intervention, and finally fails after 21 minutes. Codex actually does try iPhone mirroring and Capcut, which is very cool and kudos to the team, but it ultimately fails after burning credits. @sama this is not easy to do, but happy to help you guys integrate our computer-use-kit. 😀 I co-founded the first startup approved by openAI to sell GPT3 for automation in August 2021(Cheatlayer) months before adept.ai so I've been working on this for a long time. We automate Mac/Windows/Linux/Chrome/Android/Iphone + @daytonaio sandboxes and @browserbase cloud browsers out of the box. We also just shipped automated benchmarks, so we're building the most comprehensive computer-use benchmark for long-running tasks on the planet.

English

4

3

19

583

Rohan Arun@RohanArun·2d

That's what I'm talking about! Won #1 twice in the Launch pitch competition this week!

LAUNCH@LAUNCH

@murr: 🥇@Super_Powers_AI: A combination of free, open-source models and parallelizing agents is very powerful. 🥈@ArgonathAI: Removing bottlenecks in defence is important. 🥉@Yorby_ai: Distribution is the most important thing to solve for companies.

English

13

3

35

950

Rohan Arun@RohanArun·2d

Twitter is taking down communities! The other way to basically emulate a community is if everyone follows me and subscribes to notifications, and I can pin posts on my profile @RohanArun" target="_blank" rel="nofollow noopener">twitter.com/@RohanArun

Nikita Bier@nikitabier

Today we're announcing two product changes for organizing communities on X: 1. XChat now supports joinable links for groupchats. Create a public link & share direct to Timeline. With support for 350 members per chat (and growing), Groupchat Links are the fastest way to bring people together on X. 2. Due to declining usage, we're deprecating X Communities on May 6. To migrate your Community's members, pin your groupchat link so people can join it over the next 2 weeks. This is part of our broader effort to simplify the experience on X. Make no mistake: we are investing heavily in niche communities with the launch of Custom Timelines—and much more to come.

English

5

0

17

409

Rohan Arun@RohanArun·2d

LFG that's what I'm talking about!

LAUNCH@LAUNCH

@mikemarg_: 🥇@Super_Powers_AI: There is no substitute for strong open-source traction. 🥈@askgrapple: Likes the focus on solving analytics in the age of AI. 🥉@askgrapple: Has a very clear ICP, and the buying process in the federal government is stuck in the past.

English

8

2

23

350

Rohan Arun

Keşfet