Sabitlenmiş Tweet
Matan Grinberg
1.8K posts

Matan Grinberg retweetledi


@MartinShkreli Sorry for the confusion here. Need to make Mission mode more obvious. DMing
English
Matan Grinberg retweetledi
Matan Grinberg retweetledi

tbh @droid maxing is life

Faisal@infmes
Almost 3B tokens used in the last month using droid missions
English
Matan Grinberg retweetledi

Zero humans. Forty features. Nine hours.
Jensen Huang says AGI is here. I agree, and I have the screenshots to prove it.
Yeah, I know. You've read this headline forty times this week. Scroll past, nobody would blame you. But I've been quietly using something since February that most people haven't caught up to yet...it's not Opus 5.
The screenshots are a system that just finished a full architecture refactor of a production codebase. It found duplication I'd been living with for months and left the codebase meaningfully better than when it started. The system is Factory AI's Droid, specifically a feature called Missions.
I paddle outrigger canoe in Hawaii. Our club needed an app. Not a vibe slop app...a real app. Authentication, group management, crew assignments with rotation logic so the same person isn't always stuck in seat 6, real-time chat, weather integration that warns you when wind is above 15 mph, notifications, admin controls. The kind of thing that would've cost six figures to build a few years ago.
So I described what I wanted to Droid, and then I sat there watching it. It didn't just start writing code. It started asking me questions. Like, good questions. Clarifying questions about edge cases I hadn't even thought of. Then it broke the whole project into 14 milestones and 76 features. And before it wrote a single line of code, it created these things called validation contracts, basically testable assertions based on the spec so it knows what "done" actually looks like. I'm sitting there like...wait, it's planning the way I would plan?
Then it started building. But here's the part that got me. When a milestone finished, separate agents came in and tested everything from the user's perspective. NOT UNIT TESTS! These agents actually opened a browser in the background and clicked through the app the way a real person would. When something failed, it didn't just retry the same thing. It went back and re-steered the entire plan. I've never seen an AI system do that successfully for many hours.
I know what you're thinking, how many tokens did this cost??!? Is it burning tokens the entire time? The answer has a lot more detail than just running a Ralph loop on steroids.
So I flew to San Francisco and sat with the Factory AI team to understand how this actually works under the hood. The orchestrator never writes code, it only delegates. Workers get cleared context between tasks so they don't hallucinate from stale state. And get this, the system isn't even tied to one model. You can run Claude as the orchestrator and GPT-5 as the worker. Their longest mission? Sixteen days. Can you imagine?
I've been building with AI live since August 2024. I've had fifty Claude windows open (well...you know what I mean), Codex running in parallel, the whole circus. You know the feeling. This is the first time I've felt like the system was genuinely thinking through a problem the way a senior engineer would scope a project before writing a single line of code.
Jensen told Lex Fridman that AGI means AI that can build a billion-dollar company. I don't know about a billion dollars, but it built my canoe club a production app while I went to make coffee. That's close enough for me.



English
Matan Grinberg retweetledi

@droid @FactoryAI Is going to win and its not even close.
I was trying EVERYTHING to fix my holdout RMSE on a model I am working on and it kept getting caught over and over and even regressing.
One prompt using droid + GPT5.4 High and fixed it. I spent almost $100 in Codex credits trying to fix this before hand.
If i could I would buy the Max plan TODAY

English
Matan Grinberg retweetledi

We plugged the CLI into @FactoryAI @Droid and within 15 minutes had:
- Daily automated spend briefings pushed to Slack
- WoW and MoM vendor analysis with anomaly detection
- Vendor management automatically cross-referenced with Google CLI (gmail, drive, etc.)
- Automated spend alerts routed by category to the right person in Slack
(fake data Slack message below)

Ramp Labs@RampLabs
Today, we're releasing Ramp CLI to let agents manage your company's finances. 50+ tools across cards, bills, expenses, travel, and approvals. Fewer tokens than MCP, and comes with pre-built skills like receipt compliance and agentic purchasing.
English
Matan Grinberg retweetledi
Matan Grinberg retweetledi

Just tested @FactoryAI for a couple of hours + adapted my opencode skills & plugins to droid. Will be daily driving it for a couple of weeks.
So far its good, but the only problem is the CLI sometimes lags, Idk if its a zellij/ghostty specific issue or not.

Goreng@sudo_goreng
Anyone tried @FactoryAI before? Is it good or nah? factory.ai/pricing
English

@matanSF @bentossell Ok, this got me and finally going to try it
English

or use droid and get features like this without waiting 4 months :)
Claude@claudeai
New in Claude Code: auto mode. Instead of approving every file write and bash command, or skipping permissions entirely, auto mode lets Claude make permission decisions on your behalf. Safeguards check each action before it runs.
English
Matan Grinberg retweetledi













