Aljosa Asanovic

149 posts

Aljosa Asanovic

@aljosa

All those prompts will be lost in time, like tears in rain. Working on https://t.co/QYIEBmalO8 @enginedotbuild & https://t.co/s0VkJ22xCg

Québec, Canada Katılım Haziran 2010

98 Takip Edilen182 Takipçiler

Aljosa Asanovic@aljosa·4h

This is what an @enginedotbuild run looks like locally [1h 58m run sped up to 30s] - Implementing an OpenSpec change. - 1h 58m, 5 phases, 13 review rounds, 57 sessions - GPT 5.5 implementing - 3 reviewers in parallel per phase ∙ 2x spec-compliance (1x Opus 4.7 + 1x GPT-5.5), ∙ 1x codebase-patterns (GPT-5.5) - $81.65 in API-equivalent cost (ran on Claude Max + Codex subscriptions) Using a simple ink (by @vadimdemedes) cli on top of the engine library and the pluggable engine/pi runtime using pi-agent-core/pi-ai (by @badlogicgames) I built this because watching an agent for two hours to make sure it doesn't sneak garbage into a PR is exhausting. the bigger idea: maintainers add a .engine config to their repo, and every agent-authored PR has to earn its way through a real review gate. So for OSS repos it's a gate against AI-generated slop PRs open sourcing it soon!

English

Aljosa Asanovic@aljosa·9h

@lucasmeijer It's indeed the absolute peak of frontend technology. Also things like version in SvelteKit to force a refresh #version" target="_blank" rel="nofollow noopener">svelte.dev/docs/kit/confi… Somehow I still don't feel safe unless I also do some kind of build triggered cache clear flow in CloudFlare.

English

Lucas Meijer@lucasmeijer·9h

Ok apparently state of the art is “rename all your assets so they include their content hash and make everything that points to a js or video or audio file be aware of that”

English

633

Lucas Meijer@lucasmeijer·1d

Every time I do some web work, I have to remember how you deal with the problem of "all clients just keep old copies of parts of your program around"

English

Aljosa Asanovic@aljosa·1d

> over 1000 GitHub commits per day > spending 12k per month > gets asked to show his most impressive product > is literally selling "premium" .md files to noobs He sure showed you @zeeg ! Lmao The amount of slop is unfathomable

Artur Podsiadły@artpods56

@doodlestein @0xLewis_gg baahahha, bro is selling markdown files

English

24.3K

Aljosa Asanovic@aljosa·1d

@KyleBoas_ @hunvreus @enginedotbuild @OpenSpec_ Getting very close to the release 👌🏻 I think most people end up doing some minimal variation of this for their own workflow. I'm sure that it will need plenty of feedback and testing to perfect it too but I think it solves that specific problem better than anything available.

English

Kyle Boas@KyleBoas_·1d

@aljosa @hunvreus @enginedotbuild @OpenSpec_ I've tried several versions of this as well. Hard to perfect. Would be interested to see your process.

English

Ronan Berder@hunvreus·3d

Talking to smarter folks than me, I'm convinced many of the AI folks in my timeline are full of shit. Nobody is "running 20 agents over night" and building stuff for actual users. Maybe some are building internal tools or disposable software. Maybe. But building software people like using? That doesn't get hacked on day one or blow up after the 3rd user? Nope. I don't even understand what that's supposed to look like. Do you work out a 57 pages document that perfectly describes what you want to build and then summon 14 agents and have them run wild for 6 hours? And what comes out on the other end isn't a broken pile of shit? Nope. Not buying it. PS: it may also be that I have an IQ of 82 and can't figure it out.

English

669

268

4.9K

795.1K

Aljosa Asanovic@aljosa·2d

@mitsuhiko What do you mean? Schuldbefreiungstugendwohlstandsgefühl is a perfectly cromulent word.

Deutsch

Armin Ronacher ⇌@mitsuhiko·3d

I find it bizarre but also interesting that there is no German word for “equity”. It explains a lot about the way people do business and transact here.

English

114

513

106.9K

Aljosa Asanovic@aljosa·2d

I've never used Alloy, but looking it up quickly, it seems like it fits cases where the spec has rules that can be formally checked, which is a different kind of problem from what engine is doing. It's also not really Gherkin. Gherkin is test scenarios in plain English that get mapped to actual test code. OpenSpec artifacts are more a structured description of the change itself, the requirements, design notes and tasks to do it. The agent reads those to know what to build, and the checking happens afterwards through the reviewers and the verify command, not by proving anything upfront. For the harness and the "build the loop for your use case" bit, that's exactly what the reviewers and the verify command are in engine. You write those yourself per repo and they're where the actual judgment about your codebase lives. The rest of it (running the phase, catching failures, sending issues to a fix agent, committing when it's clean, keeping state so you can resume) is the same no matter what you're building. So really the only part engine is opinionated about is that you run multiple passes in a structured way and force the LLM through the same process every phase. Implement, verify, review, fix until it's clean, commit, move on. The reviewers, the requirements, the extra instructions, the models you want to use, that's all yours to plug in. I think the main issue for me was always that the LLM inside whatever harness you're using shouldn't be what owns and manages the orchestration loop, it needs to be external and mechanical. For engine specifically the goal is not to solve agentic loops for a bunch of use cases, it solves a specific use case very well and that's building code that matches your intent without having to babysit the models. You're the one who defines the intent and the quality requirements.

English

Hannes Lehmann@_hanneslehmann_·2d

So it looks like gherkin on first sight? A friend of mine is experimenting with Alloy to prove specs are right (if I understood right) and use AI only to code generate proven specs in strict guardrails. I think the big contradiction: you cannot have generic harness building specific code. Skills and Instructions are to weak. You need to builf agentic loop for your use cases. Which ends up that you could build product directly in first place.

English

Aljosa Asanovic@aljosa·2d

I think the difference is before I would spend 8 hours supervising as I went along to make sure I course correct and don't end up with a mountain of crap like you said. Now I'll spend 2 hours on a spec (which I find more rewarding) and I know I can trust the engine implementation to deliver on exactly what I wanted without getting slop. During that time I don't have to micromanage. I can just focus on the next feature or go workout (or nap) I'm excited for people to try it out! But I also fully get your point of view. I think anyone who seriously produces code with AI feels the same way.

English

387

Ronan Berder@hunvreus·2d

I'm also building agents with Pi, and I do read less and less of the code AI generates for me. But I'm still not running stuff unsupervised for hours. I suppose I could if I fed it a large enough scope, but pretty sure I'd end up with a mountain of crap to review. Way easier to go step by step, challenge it as we go and regularly refactor/optimize.

English

424

Aljosa Asanovic@aljosa·2d

I was writing an *extensive* reply to you and my Twitter app crashed.. 🥲. Rewriting now 😵 Most of the time I invest is in writing the specs correctly (using @OpenSpec_ right now). I genuinely think it's possible to produce a lot of good code by managing multiple agents but it's by far the most mentally exhausting thing I've ever done. I understand why so many people burn out trying to push the limits with AI coding. We were basically sold on the promise of AI being a force multiplier but then you have to figure out how to stop being the bottleneck to actually benefit from it (specifically for LLM based coding I mean). That's without micromanaging agents and stopping whatever LLM you're using from doing something you've told it ten times not to do, no matter how much Sam Altman told you it was PhD Level. I'll be releasing the open source @enginedotbuild library soon that I mentioned in my previous tweet. It's basically a fully orchestrated (and fully customizable) phased implementation -> verification -> review loop. You define reviewers for your repo to start (I use for example a spec compliance and a codebase patterns reviewer) It takes in either a task object or natively consumes OpenSpec artifacts and then launches an implementation agent for a phase. Once implementation is completed, you can have an automatic verification command run first (tests/linting/typecheck) and then reviewers kick in and for any issues they find, it gets sent to a fix agent. This keeps repeating until no more issues are found and only then does it auto commit the work and continue to the next phase where the process repeats. You're also not limited to any provider, you can have Opus 4.7 implementing and a mix of GPT 5.5 and any other model provider reviewing for example. LLMs will statistically fuck up, you basically have to go through this process if you want any hope of your full set of requirements being met while maintaining code quality. I'd compare it to playing poker. If you make decisions with a positive expected value, over time you're guaranteed to make money but variance means in the short term you can still lose. So to get back to your actual questions, I spend much less time now reviewing code. I do it mostly to evaluate the quality of results that engine produces and I spend basically close to zero time correcting code because the system actually works as intended. I'd say my average runs are maybe 2 hours and longest was somewhere around 10 hours for a big change. There's really no limit to how long it can run but my goal isn't to stuff everything into a massive change, it's to stop having to micromanage LLMs. It feels really good to start a run before going to bed and go wake up to something that actually works, is well tested and reviewed. I'm building some products on top of the engine library prior to release (an MCP server , a CLI and a Pi.dev extension). So you can keep using Claude Code, Codex, Cursor, whatever you're using and you just offload the actual implementation to the engine. Or you can build your own solution using the engine library directly. It doesn't solve the magical "20 agents running all night" situation, it just gives you a really robust flow to produce high quality results.

English

549

Ronan Berder@hunvreus·2d

@aljosa @enginedotbuild I’m curious: do you spend a couple of hours every day reviewing what was generated? Do you then spend another few hours correcting it? How large of a scope are we talking about and how long do they run for usually? Genuinely curious.

English

2.2K

Aljosa Asanovic@aljosa·3d

They absolutely nailed the verbosity on GPT 5.5 compared to 5.4 It's a joy to use so far.

English

204

Aljosa Asanovic@aljosa·3d

@OpenAI Goodbye Flopus 4.7 👋🏻

English

120

OpenAI@OpenAI·3d

Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex.

English

2.4K

51.5K

12.1M

Aljosa Asanovic@aljosa·3d

> be me > run /ultrareview on a massive PR built with @enginedotbuild > zero findings. not sure if I should be happy about the crisp engine PR quality or sad that I wasted one of 3 free ultrareviews.

GIF

ClaudeDevs@ClaudeDevs

New in Claude Code: /ultrareview (research preview) runs a fleet of bug-hunting agents in the cloud. Findings land in the CLI or Desktop automatically. Run it before merging critical changes—auth, data migrations, etc. Pro and Max users get 3 free reviews through 5/5.

English

166

Aljosa Asanovic@aljosa·4d

@badlogicgames Yep have been using my own build. Sounds good I'll wait for the refactor and happy to clean it up once it's done to make it easier 🤝

English

Mario Zechner@badlogicgames·4d

@aljosa unless you use it yourself, don't bother. the refactor will change internals enough that rebasing over and over will get super annoying. however, once the refactor is done, implementing interrupt() should be trivial (and i'll do it)

English

Mario Zechner@badlogicgames·4d

ZXX

8.6K

Aljosa Asanovic@aljosa·4d

@badlogicgames I, too, hate open source contributors.

GIF

English

Mario Zechner@badlogicgames·4d

@aljosa yeah, that's gonna take a little while, needs a refactor of AgentSession, which is pending for weeks now because people keep submitting other stuff...

English

130

Aljosa Asanovic@aljosa·4d

@badlogicgames I'll go rebase

GIF

English

Aljosa Asanovic@aljosa·4d

@badlogicgames Lol. Sorry 🙈

GIF

English

122

Aljosa Asanovic@aljosa·5d

@icanvardar well you wouldn't, they all live in gated communities paid with their (made with lovable) apps.

English

147

Can Vardar@icanvardar·5d

never met a lovable user before

English

125

Aljosa Asanovic@aljosa·5d

gpt-image-2 is now in the playground. The best way to play with OpenAI's new image model. github.com/alasano/gpt-im…

English

187

Aljosa Asanovic@aljosa·6d

POV: Codex is down (you took refuge from Claude always being down)

English

309

Aljosa Asanovic@aljosa·6d

@robzolkos You should sell pre-built lego sets. I don't want all that "building it myself and having fun doing it" tax.

English

802

Rob Zolkos@robzolkos·6d

If you are pi-curious but don't have the time to set everything up I made lazypi.org for you. It's a curated starting point that gets you to an exciting, useful experience immediately, without the research and configuration tax.

English

694

38.8K

Keşfet

@enginedotbuild @vadimdemedes @badlogicgames @lucasmeijer @zeeg @KyleBoas_ @hunvreus @OpenSpec_