Ilya Zayats

1K posts

Ilya Zayats banner
Ilya Zayats

Ilya Zayats

@somebody32

I love to build products. CTO @factorialapp

Barcelona Katılım Mayıs 2008
172 Takip Edilen568 Takipçiler
Ilya Zayats
Ilya Zayats@somebody32·
@nateberkopec Yes, but not at massive scale yet. It's amazing for finding paper cuts that sum up fast. But also amazing at overfitting for a particular metric. Main problem is the old one — benchmark noise: agents often pick things that don't actually move the needle, and vice-versa
English
0
0
1
84
Nate Berkopec
Nate Berkopec@nateberkopec·
Has anyone applied an autoresearch loop (successfully or not) to a Ruby project outside of Shopify yet? I would like to cover it in my RubyKaigi talk.
English
5
0
6
1.8K
Ilya Zayats
Ilya Zayats@somebody32·
@davebcn87 @0xRkyd That would help for benchmarks a lot, thanks! But still could suffer from the cases where noise increased after the baseline was taken, right? (dev machine reality) Not sure though that could be solved at the harness level, the experiment setup needs to take care of it
English
1
0
0
1.6K
David Cortés
David Cortés@davebcn87·
Today we added statistical confidence on pi-autoresearch. Now the LLM gets a confidence metric that will guide it to re-run the experiment if there was noise detected during the measurement. Thanks @0xRkyd for bringing the idea.
David Cortés tweet media
English
5
9
229
68K
Ilya Zayats
Ilya Zayats@somebody32·
Pi really changed how I think about dev tools. You don't tolerate friction anymore. You just ask your agentic harness to fix itself. File picker was slow in our monorepo? 30s later there's a workaround caching files and refreshing in the background
Ilya Zayats tweet media
English
0
0
0
142
Ilya Zayats
Ilya Zayats@somebody32·
But still surprised that just 6 months ago the ratio was totally different: you'd need to work around model quirks constantly
English
0
0
0
65
Ilya Zayats
Ilya Zayats@somebody32·
AI work now: 5% prompts/models, 95% tools. People underestimate how much good tools boost AI performance
English
1
0
1
93
Ilya Zayats
Ilya Zayats@somebody32·
The more I work with Codex and Opus, the more stereotypical they become One is a classical nerd: short explanations, lots of thinking, things work The other is a young startup CEO: great pitches and presentations, but drops half your codebase when he decides it's time to pivot
English
0
0
0
144
Ilya Zayats retweetledi
Jarred Sumner
Jarred Sumner@jarredsumner·
@adamdotdev This “adult in the room” framing is pretty rude to the Claude Code team that built a product hitting $1B run-rate revenue faster than probably anything in history. Bun made like $2.50 total (stickers). Engineering is relative to time & tradeoffs & they made fantastic tradeoffs
English
113
89
4.4K
451K
Ilya Zayats
Ilya Zayats@somebody32·
Today I solved that with something too stupid to be true: asking the model to self-inspect and request increased reasoning when needed. Then you can do exactly that between tool calls
Ilya Zayats tweet mediaIlya Zayats tweet media
English
0
0
1
89
Ilya Zayats
Ilya Zayats@somebody32·
Now the main dichotomy is speed vs. persistence. If you put a high reasoning level on the model, it's very persistent for complex queries but takes ages for simple ones (not even talking about the cost). And with low reasoning it tends to stop early without resolving anything
English
1
0
0
108
Ilya Zayats
Ilya Zayats@somebody32·
The current generation of models requires way less babysitting. I'm finding myself compressing really complicated workflows into just one agent with a ReACT loop. So much code can be removed
English
1
0
0
131
Ilya Zayats
Ilya Zayats@somebody32·
This morning I tried to look for some code that does the same via API, but couldn't find the exact flow. A lot of one-shot solutions but nothing that would allow models to come to consensus iteratively
English
0
0
0
172
Ilya Zayats
Ilya Zayats@somebody32·
Found myself lately often creating a "consortium" between Gemini Pro 2.5 and o3: passing the same task to both and then asking each to rate the opponent's answer and pick the best parts of both. A couple of iterations and you get a very well-rounded outcome.
English
1
0
0
246
Ilya Zayats
Ilya Zayats@somebody32·
Plan/act separation will have more stages and more artifacts. Different models will be optimized for each. But these artifacts will be always required and must stay up-to-date to keep models focused. And that leads us back to waterfall, but just at ludicrous speeds
English
0
0
0
175
Ilya Zayats
Ilya Zayats@somebody32·
I wouldn't be surprised if waterfall becomes the de facto way to build software again pretty soon. It was long and clumsy for humans but perfect for AI.
English
1
0
1
214
Dmitry Zaets
Dmitry Zaets@dmitryzaets·
Weekend routine: change and wash the keycaps. How often do you wash your keyboards?
Dmitry Zaets tweet mediaDmitry Zaets tweet mediaDmitry Zaets tweet media
English
1
0
2
112
Ilya Zayats
Ilya Zayats@somebody32·
@masylum I do the same for all todos that I have (which is a one long obsidian note). That is the peak productivity system
English
0
0
1
85
Pao Ramen
Pao Ramen@masylum·
Productivity tip for solo projects. One file called TODO.md in your repo. Contains a list of "tasks" like `- [ ] redesign the dashboard`. When you start working on it, you add a `.` inside the checkbox. When you finish, you remove the task. That's all you need.
English
1
0
11
631