Elie Steinbock — oss/acc

34.7K posts

Elie Steinbock — oss/acc

@elie2222

Building https://t.co/0MTUhgDLIE, your executive assistant for email. 15k users. OSS | Cursor Ambassador | YouTube on open source: https://t.co/qf66pPJzgf

Tel Aviv Katılım Haziran 2010

3.2K Takip Edilen13.7K Takipçiler

Sabitlenmiş Tweet

Elie Steinbock — oss/acc@elie2222·3 Mar

OMG I got Qwen 3.5 4b running on my emails now And it's handling them correctly on just 5GB RAM 🤯 using @inboxzero_ai as the harness

English

240

29.7K

Elie Steinbock — oss/acc@elie2222·3h

@euboid gpt5.4 mini hallucinates less?

Indonesia

Wilson Wilson@euboid·4h

@elie2222 I find it hallucinates way too much in internal benchmarks. So much that I can't trust it for almost any use-case 😅

English

Elie Steinbock — oss/acc@elie2222·13h

Gemini 3 Flash still a GOAT

English

530

Elie Steinbock — oss/acc@elie2222·6h

@wesbos Need to use xhigh and high thinking on the next test 🫡

English

713

Wes Bos@wesbos·6h

Composer 2 vs Opus 4.6 vs GPT 5.4 - a totally unscientific test > Create a Twitter clone. Use Better Auth, Vite, Sqlite, Drizzle, Typescript, and React with Tanstack Start. Each one took ~5 mins in plan mode. Each had access to a browser to test. Composer: 5 mins $6.04 1,250 LOC Opus: 19 mins $10.43 1,000 LOC GPT: 22 mins $14.15 2,000 LOC Opus seems to use the cache WAY more than composer, so it's not really 10x more expensive. Composer app ran first try. Other two needed a bit of CORS debugging but did work. Code between all was extremely similar All three done inside cursor - so Claude Code / Codex may have been different Models used: Composer 2.0 (regular, not fast) Opus 4.6 Medium Thinking GPT 5.4 Medium Thinking

Wes Bos@wesbos

Cursor just launched Composer 2 - their own model. It's 10× cheaper than Opus 4.6 and supposed to rival it. I've been using it for a few days, I don't have any skewed graphs to show you but from a pure vibes POV I can tell you it's pretty good™ My litmus test right now is if it can build a 3D Printable model with Manifold CAD. I build a Gif zoetrope generator and it did fantastic.

English

738

180.6K

Elie Steinbock — oss/acc@elie2222·6h

@harriskennyx @inboxzero_ai ya. solid option to use in product. i wouldn't worry too much about exact model. best model for price changes the whole time. so you want to be adaptable. if you use openrouter/vercel ai gateway/ai sdk, it makes it easy to switch to the best option when things change

English

Harris Kenny@harriskennyx·6h

@elie2222 @inboxzero_ai this is exactly what i was thinking… i have some things i want to use AI for in our product but would be potentially very high volume… this might be it!

English

Elie Steinbock — oss/acc@elie2222·7h

@wickedguro @euboid @getsentry Ah okay. So potentially look at Claude SDK. Why you thinking Codex SDK over it?

English

Nevo David@wickedguro·9h

@elie2222 @euboid @getsentry Nah, I need the agent to find the problems

English

Wilson Wilson@euboid·14h

Has anybody figured out how to do this? - @getsentry issue reported - codex agent spun up with access to sentry + axiom logs & traces - Draft PR auto-created w/ root cause analysis + fix

English

22.9K

Elie Steinbock — oss/acc@elie2222·7h

@harriskennyx @inboxzero_ai You need good quality at a good price point. 3 Flash is very strong for the price. There is a new wave of models like Kimi/Qwen/Minimax that may even be stronger. But privacy concerns are the problem there.

English

Elie Steinbock — oss/acc@elie2222·7h

@harriskennyx So for your day to day dev, I wouldn't recommend it. It's fine and cheap. But just use the frontier models for that. But if you have an AI product and you're spending thousands on tokens. eg. processing millions of emails as we do for @inboxzero_ai, then you don't need Opus.

English

Elie Steinbock — oss/acc@elie2222·7h

This is massive! I'm yet to be convinced it's as strong as Opus 4.6. But it's strong, and price point is very good. I need to test more to see if it really competes with Opus. But why this is so big: #1 reason people have been moving away from Cursor is the price. Anthropic and OpenAI have been massively subsidizing tokens. Cursor had to resell someone else's model. With this upgrade, Cursor can finally sell their own model. But sell a version that's stronger and cheaper. Long term this is a huge advantage.

Cursor@cursor_ai

Composer 2 is now available in Cursor.

English

Elie Steinbock — oss/acc@elie2222·9h

@wickedguro @euboid @getsentry Also I feel like if you actually just want to make API calls then you don't need the CLI or SDK either way.

English

Elie Steinbock — oss/acc@elie2222·9h

@wickedguro @euboid @getsentry Ah, we weren't talking about customer support above. But simple approach that'll work across Codex/Claude Code is to write a skill, simple CLI it can call. That's about it. Also look at Claude SDK if you haven't already.

English

Elie Steinbock — oss/acc@elie2222·10h

@wickedguro @euboid @getsentry Why not just use the CLI direct?

English

Nevo David@wickedguro·10h

@elie2222 @euboid @getsentry This is interesting. I was thinking of using this @openai/codex-sdk" target="_blank" rel="nofollow noopener">npmjs.com/package/@opena… But, it's CLI-based, so I'm not sure how well I can run it with Docker on a server. This cursor automation stuff sounds good

English

Elie Steinbock — oss/acc@elie2222·10h

@concaption thanks!

English

Usama Navid@concaption·12h

@elie2222 i watched this

English

Elie Steinbock — oss/acc@elie2222·13h

1 of 10 video 😍

English

235

Elie Steinbock — oss/acc@elie2222·10h

Inbox Zero CLI listed on Cursor Directory 😍

English

679

Elie Steinbock — oss/acc@elie2222·12h

@ctatedev @nbaschez Would love to hear any insights there. What are approaches you’ve tested?

English

Chris Tate@ctatedev·12h

@elie2222 @nbaschez I've been thinking a lot about this lately

English

Chris Tate@ctatedev·1d

~100% of my dev is done in sandboxes in the cloud Highly recommend it: - Unlimited parallel agent sessions - My local machine stays safe - Can work from anywhere - Can close laptop - Lap stays cool Interesting idea to visualize with Kanban

Ryan Carson@ryancarson

100% of dev is going to be done in sandboxes in the cloud, controlled by kanban boards. Trust me, I love my local machine and gorgeous mac apps, but all of it is just a terrible form factor for running a team of agents effectively.

English

880

137.7K

Elie Steinbock — oss/acc@elie2222·12h

@nbaschez @ctatedev Ya third party services has been the real challenge for us :(

English