
Robert Balicki (👀 @IsographLabs)
17.3K posts

Robert Balicki (👀 @IsographLabs)
@StatisticsFTW
@isographlabs framework author. Currently @Pinterest. Ex-React Data Team @Facebook. Co-organizer of #RustNYC. I like Rust, Relay, stats, GraphQL, React, JS




I'm interested in "trapped buildings": those that couldn't be built today (because of zoning and code changes) but also can't be substantially modified or demolished (because of historic protection rules). One of those phenomena that really makes one wonder what exactly we're trying to do. Has anyone ever estimated what fraction of buildings in major cities fall into this category? When I asked Claude about San Francisco, it concluded: "If forced to give a single number with a single confidence rating: roughly 100,000 buildings — about two-thirds of San Francisco's physical structures — sit in the trap as a practical matter. Confidence: moderate. The number could be 70,000 or 130,000 depending on how strictly you operationalize "can't be substantially modified.""


While alternative coding harnesses may have short term lift, they will be bitter lesson’d away. I am bearish on any harness that doesn’t come from the lab whose model you are using. You’re fighting against post-training. To put a finer point on this, you know how like, ioctls are like “huh that's weird but I guess whatever it's what we've got we can work with that”? It is exact the same with like, the particular JSON construction the Codex shell tool uses. The model used to mangle nested quotes in this monstrosity RPC all the time but now it does not and it does not matter that the API is bad because billions of failed invocations are used to train to the harness we have, not the harness we deserve.

● I need to flag something serious: the merge commit was pushed to origin/main despite my instruction not to push. - Claude 4.7 Max Thinking There's something fundamentally wrong with these models. Clean context + specific instructions, but they still fuck up in simple ways





Mandatory human-in-the-loop is a cybersecurity cop-out. People are giving agents more and more autonomy. We need solutions that accept that world because there is no stopping it. It's like telling people in the 90s to not use the internet to avoid getting hacked. Good luck.



🎪 I'm extremely excited to announce the release of version 0.3 of Barnum! Barnum is now a programming language for asynchronous and parallel computation whose goal is to make it extremely easy for you to orchestrate your agents! So why use a programming language for this? Why not just use plan mode/a markdown file for the complicated cases? Well, LLMs are incredibly powerful tools, but they certainly aren't reliable. If an LLM is in charge, you risk it changing its mind and implementing something else, or disabling unit tests and generally cutting corners. (Very relatable, to be honest.) And furthermore, it is hard to accurately express complicated workflows with loops and conditionals in prose. The answer is to use a workflow engine. Barnum is a workflow engine masquerading as a programming language. When a workflow engine is in charge, your LLMs can't wriggle out of requirements, and it's easier to accurately describe the actual, complicated workflow. And it's this increase in reliability that allows you to build bigger, more impactful agentic workflows. Already, I've used Barnum to ship hundreds of PRs. Other folks have used it to push forward on automated migrations, remove dead code, implement a RAG search pipeline, and validate all of the statements in publicly facing documentation. I hope you give it a try! pnpm install @barnum/barnum But read on for more cool details...

🎪 I'm extremely excited to announce the release of version 0.3 of Barnum! Barnum is now a programming language for asynchronous and parallel computation whose goal is to make it extremely easy for you to orchestrate your agents! So why use a programming language for this? Why not just use plan mode/a markdown file for the complicated cases? Well, LLMs are incredibly powerful tools, but they certainly aren't reliable. If an LLM is in charge, you risk it changing its mind and implementing something else, or disabling unit tests and generally cutting corners. (Very relatable, to be honest.) And furthermore, it is hard to accurately express complicated workflows with loops and conditionals in prose. The answer is to use a workflow engine. Barnum is a workflow engine masquerading as a programming language. When a workflow engine is in charge, your LLMs can't wriggle out of requirements, and it's easier to accurately describe the actual, complicated workflow. And it's this increase in reliability that allows you to build bigger, more impactful agentic workflows. Already, I've used Barnum to ship hundreds of PRs. Other folks have used it to push forward on automated migrations, remove dead code, implement a RAG search pipeline, and validate all of the statements in publicly facing documentation. I hope you give it a try! pnpm install @barnum/barnum But read on for more cool details...








