Tom Glowacki
161 posts

Tom Glowacki
@DigitalTom86
Finance | AI | Decade in Corp | ex-VC at Big Asset Management Shop






🚨BREAKING: Alibaba tested AI coding agents on 100 real codebases, spanning 233 days each. the agents failed spectacularly. turns out passing tests once is easy. maintaining code for 8 months without breaking everything is where AI collapses. SWE-CI is the first benchmark that measures long-term code maintenance instead of one-shot bug fixes. each task tracks 71 consecutive commits of real evolution. 75% of AI models break previously working code during maintenance. only Claude Opus 4 stays above 50% zero-regression rate. every other model accumulates technical debt that compounds over iterations. here's the brutal part: - HumanEval and SWE-bench measure "does it work right now" - SWE-CI measures "does it still work after 6 months of changes" agents optimized for snapshot testing write brittle code that passes tests today but becomes unmaintainable tomorrow. Alibaba built EvoScore to weight later iterations heavier than early ones. agents that sacrifice code quality for quick wins get punished when consequences compound. the AI coding narrative just got more honest: most models can write code. almost none can maintain it.




właśnie wrocilem z wycieczki do firmy, która przeprowadziła serię zwolnień i kupiła wszystkim Clauda Max i teraz ma, jakby to powiedzieć, wszystkie deliverable za poprzedni roku upierdolone rewolucja AI nie zwalnia







YES. Wintermute is suing Binance. And they are not the only ones that got rekt. They lost hundreds of millions. I have all the names of who is about to blow up. It's not going to be pretty. Unless CZ finds a way for compensation, this is going to be bad. For weeks, everyone in the backstage has known. And sharks love blood. Don't think this is another FTX. THIS IS PURIFICATION. Fuck Wintermute.














