Ilya Zayats

1K posts

Ilya Zayats

@somebody32

I love to build products. CTO @factorialapp

Barcelona Katılım Mayıs 2008

172 Takip Edilen568 Takipçiler

Ilya Zayats@somebody32·3h

@nateberkopec Yes, but not at massive scale yet. It's amazing for finding paper cuts that sum up fast. But also amazing at overfitting for a particular metric. Main problem is the old one — benchmark noise: agents often pick things that don't actually move the needle, and vice-versa

English

Nate Berkopec@nateberkopec·5h

Has anyone applied an autoresearch loop (successfully or not) to a Ruby project outside of Shopify yet? I would like to cover it in my RubyKaigi talk.

English

1.8K

Ilya Zayats@somebody32·5d

Seeing this more and more everyday. The difference is stark

antirez@antirez

Wow, I totally disagree with this statement. At the current state, AI actually amplifies the developer to developer difference. If you were a 10x developer, you had good ideas + architectural clarity, this is a brutal advantage when using AI. Steering is a fundamental part of today's AI development.

English

Ilya Zayats@somebody32·6d

@davebcn87 @0xRkyd That would help for benchmarks a lot, thanks! But still could suffer from the cases where noise increased after the baseline was taken, right? (dev machine reality) Not sure though that could be solved at the harness level, the experiment setup needs to take care of it

English

1.6K

David Cortés@davebcn87·6d

Today we added statistical confidence on pi-autoresearch. Now the LLM gets a confidence metric that will guide it to re-run the experiment if there was noise detected during the measurement. Thanks @0xRkyd for bringing the idea.

English

229

68K

Ilya Zayats@somebody32·20 Şub

Pi really changed how I think about dev tools. You don't tolerate friction anymore. You just ask your agentic harness to fix itself. File picker was slow in our monorepo? 30s later there's a workaround caching files and refreshing in the background

English

142

Ilya Zayats@somebody32·20 Şub

But still surprised that just 6 months ago the ratio was totally different: you'd need to work around model quirks constantly

English

Ilya Zayats@somebody32·20 Şub

AI work now: 5% prompts/models, 95% tools. People underestimate how much good tools boost AI performance

English

Ilya Zayats@somebody32·11 Şub

The more I work with Codex and Opus, the more stereotypical they become One is a classical nerd: short explanations, lots of thinking, things work The other is a young startup CEO: great pitches and presentations, but drops half your codebase when he decides it's time to pivot

English

144

Ilya Zayats retweetledi

Jarred Sumner@jarredsumner·4 Şub

@adamdotdev This “adult in the room” framing is pretty rude to the Claude Code team that built a product hitting $1B run-rate revenue faster than probably anything in history. Bun made like $2.50 total (stickers). Engineering is relative to time & tradeoffs & they made fantastic tradeoffs

English

113

4.4K

451K

Ilya Zayats@somebody32·1 Şub

Today I solved that with something too stupid to be true: asking the model to self-inspect and request increased reasoning when needed. Then you can do exactly that between tool calls

English

Ilya Zayats@somebody32·1 Şub

Now the main dichotomy is speed vs. persistence. If you put a high reasoning level on the model, it's very persistent for complex queries but takes ages for simple ones (not even talking about the cost). And with low reasoning it tends to stop early without resolving anything

English

108

Ilya Zayats@somebody32·1 Şub

The current generation of models requires way less babysitting. I'm finding myself compressing really complicated workflows into just one agent with a ReACT loop. So much code can be removed

English

131

Ilya Zayats@somebody32·9 Eki

Really happy for @mastra and their progress! We’ve been collaborating for the past months, and the speed at which they’re improving the framework is on another level

Sam Bhagwat@calcsam

we raised a $13m seed round from 120+ of Silicon Valley’s top investors for @mastra, the leading TypeScript agent framework

English

2.2K

Ilya Zayats@somebody32·25 Tem

Really waiting for @ArtificialAnlys to create coding agents arena

English

244

Ilya Zayats@somebody32·12 Tem

This morning I tried to look for some code that does the same via API, but couldn't find the exact flow. A lot of one-shot solutions but nothing that would allow models to come to consensus iteratively

English

172

Ilya Zayats@somebody32·12 Tem

Found myself lately often creating a "consortium" between Gemini Pro 2.5 and o3: passing the same task to both and then asking each to rate the opponent's answer and pick the best parts of both. A couple of iterations and you get a very well-rounded outcome.

English

246

Ilya Zayats@somebody32·23 Şub

Plan/act separation will have more stages and more artifacts. Different models will be optimized for each. But these artifacts will be always required and must stay up-to-date to keep models focused. And that leads us back to waterfall, but just at ludicrous speeds

English

175

Ilya Zayats@somebody32·23 Şub

I wouldn't be surprised if waterfall becomes the de facto way to build software again pretty soon. It was long and clumsy for humans but perfect for AI.

English

214

Ilya Zayats@somebody32·12 Eki

@dmitryzaets They were so dirty so became purple? :)

English

Dmitry Zaets@dmitryzaets·12 Eki

Weekend routine: change and wash the keycaps. How often do you wash your keyboards?

English

112

Ilya Zayats@somebody32·17 Haz

@masylum I do the same for all todos that I have (which is a one long obsidian note). That is the peak productivity system

English

Pao Ramen@masylum·16 Haz

Productivity tip for solo projects. One file called TODO.md in your repo. Contains a list of "tasks" like `- [ ] redesign the dashboard`. When you start working on it, you add a `.` inside the checkbox. When you finish, you remove the task. That's all you need.

English

631

Keşfet

@nateberkopec @davebcn87 @0xRkyd @adamdotdev @mastra @ArtificialAnlys @dmitryzaets @elonmusk