Robert Balicki (👀 @IsographLabs)

17.3K posts

Robert Balicki (👀 @IsographLabs)

Robert Balicki (👀 @IsographLabs)

@StatisticsFTW

@isographlabs framework author. Currently @Pinterest. Ex-React Data Team @Facebook. Co-organizer of #RustNYC. I like Rust, Relay, stats, GraphQL, React, JS

Manhattan, NY Katılım Nisan 2012
1.3K Takip Edilen1.7K Takipçiler
R 'Nearest' Nabors
R 'Nearest' Nabors@rachelnabors·
Which would you log into to keep up with your socials and news?
R 'Nearest' Nabors tweet mediaR 'Nearest' Nabors tweet media
English
5
0
0
172
Robert Balicki (👀 @IsographLabs)
"p" and "o" are very near each other on the keyboard, yet "np" and "no" have very different meanings when responding to a text
English
0
0
0
58
Patrick Collison
Patrick Collison@patrickc·
I think this reflects our confused spiritual relationship with our own past. We feel a strong and intuitive affinity for what we used to build, while our contemporary frameworks consider such buildings misguided and "immoral". The mind says massing prohibitions and setbacks; the heart says beautiful masonry, higgeldy-piggeldy layouts, and Victorians. Rather than resolve the tension, we designate large fractions of our city ontologically confused, simultaneously protected and illegal; sanctified and condemned.
Patrick Collison tweet media
Patrick Collison@patrickc

I'm interested in "trapped buildings": those that couldn't be built today (because of zoning and code changes) but also can't be substantially modified or demolished (because of historic protection rules). One of those phenomena that really makes one wonder what exactly we're trying to do. Has anyone ever estimated what fraction of buildings in major cities fall into this category? When I asked Claude about San Francisco, it concluded: "If forced to give a single number with a single confidence rating: roughly 100,000 buildings — about two-thirds of San Francisco's physical structures — sit in the trap as a practical matter. Confidence: moderate. The number could be 70,000 or 130,000 depending on how strictly you operationalize "can't be substantially modified.""

English
49
59
952
126.4K
Robert Balicki (👀 @IsographLabs)
I think there's also a case to be made that the models need input to provide good output. If you just say "write me a poem", it will write generic pap. Only if provide paragraphs about what you actually want (and know what you want!) will it provide great output. Coding is similar. The requirements for an agent doing code review vary widely! In some cases, you want to match what the codebase has, because it has good examples. In other cases, you want to improve upon it. In yet others, the code is throwaway and can be worse. A single, generic "review this code" instruction will fail to distinguish between those three. So the first point is that you need to provide lots of detailed instructions. The second point is that most folks are too lazy to do that, or do not know how to steer the model, or do not see the benefit to doing this, or do not benefit from doing this in their use case. IMO that's likely to be the 95% use case. And the third point is that the massive labs need to focus on the 95% use case. So, the question becomes — will they provide a harness that adequately handles the 95% use case while also handling the leet correct-by-construction case? Maybe! The labs need correct-by-construction code internally, so they're motivated to do it. But it's not as slam dunk of a case as OP makes it seem. So, I think it's likely that the market will sustain a better harness for those that care more about correctness, etc. than the average programmer.
English
0
0
0
50
Malte Ubl
Malte Ubl@cramforce·
What Ryan says is false today. It could be true in the future (and Ryan could know that this future will exist) if the models keep the harness itself fully private. Because as long as the harness is public (codex is open source, claude can be easily reverse engineered using codex), any custom harness can use quasi-identical tools (etc) and differentiate at a layer orthogonal to the post-training target. Almost all the differentiation of a harness is already in the latter category. But even if the harnesses go fully private (which would mean making them in-cloud APIs): Just like abstractions were added on top of the models, abstractions will be added on top of the harness. The model labs might be the ones who make the best harness for their model but they will not make the abstraction that you use to use their harness.
Ryan Lopopolo@_lopopolo

While alternative coding harnesses may have short term lift, they will be bitter lesson’d away. I am bearish on any harness that doesn’t come from the lab whose model you are using. You’re fighting against post-training. To put a finer point on this, you know how like, ioctls are like “huh that's weird but I guess whatever it's what we've got we can work with that”? It is exact the same with like, the particular JSON construction the Codex shell tool uses. The model used to mangle nested quotes in this monstrosity RPC all the time but now it does not and it does not matter that the API is bad because billions of failed invocations are used to train to the harness we have, not the harness we deserve.

English
17
5
154
28.6K
Rhys
Rhys@RhysSullivan·
youtube has completely ruined their homepage with shortsslop
Rhys tweet media
English
55
1
282
25.9K
Mo
Mo@atmoio·
the future of software engineering seems uncontroversially prompting + code review. startups will skip the code review because they’re racing against time. larger/serious orgs will take code review very seriously. llms can do code review, but my guess is that because they have to search through large space, it will be as expensive to have say mythos review your code as it would be to have a senior dev. based on budget: $: prompting only $$: low grade llm review $$$: mid grade llm + dev review $$$$: high grade llm + sr dev review btw, software (past the bootstrapping phase) will get more expensive to make and take more time. quality will remain exactly the same as when humans were doing it: shit.
Zack Korman@ZackKorman

Mandatory human-in-the-loop is a cybersecurity cop-out. People are giving agents more and more autonomy. We need solutions that accept that world because there is no stopping it. It's like telling people in the 90s to not use the internet to avoid getting hacked. Good luck.

English
98
43
626
71.8K
Robert Balicki (👀 @IsographLabs) retweetledi
Andrew Neel
Andrew Neel@andrewneel·
This might be the best footer I've ever seen. Well done @contralabs_ai
English
97
387
6.6K
745K
Andy Wang
Andy Wang@pyrons_·
@StatisticsFTW maybe publish an article about barnum? i still don't understand the 10s elevator pitch
English
1
0
1
8
Robert Balicki (👀 @IsographLabs)
Literally every time I try to vibe code some PRs that are even moderately complicated, I regret not using Barnum.
English
1
0
1
106
Robert Balicki (👀 @IsographLabs)
@davidfowl It's the "roll your own blog engine"
Robert Balicki (👀 @IsographLabs)@StatisticsFTW

🎪 I'm extremely excited to announce the release of version 0.3 of Barnum! Barnum is now a programming language for asynchronous and parallel computation whose goal is to make it extremely easy for you to orchestrate your agents! So why use a programming language for this? Why not just use plan mode/a markdown file for the complicated cases? Well, LLMs are incredibly powerful tools, but they certainly aren't reliable. If an LLM is in charge, you risk it changing its mind and implementing something else, or disabling unit tests and generally cutting corners. (Very relatable, to be honest.) And furthermore, it is hard to accurately express complicated workflows with loops and conditionals in prose. The answer is to use a workflow engine. Barnum is a workflow engine masquerading as a programming language. When a workflow engine is in charge, your LLMs can't wriggle out of requirements, and it's easier to accurately describe the actual, complicated workflow. And it's this increase in reliability that allows you to build bigger, more impactful agentic workflows. Already, I've used Barnum to ship hundreds of PRs. Other folks have used it to push forward on automated migrations, remove dead code, implement a RAG search pipeline, and validate all of the statements in publicly facing documentation. I hope you give it a try! pnpm install @barnum/barnum But read on for more cool details...

English
0
0
2
855
David Fowler
David Fowler@davidfowl·
I’m gearing up to build my own agent orchestration system. Are we all doing this now?? What stage of grief is this?
English
76
10
218
33.4K
Robert Balicki (👀 @IsographLabs)
@vasuman Folks on this thread should check out Barnum! It adds the determinism and structure that makes AI reliable. x.com/statisticsftw/…
Robert Balicki (👀 @IsographLabs)@StatisticsFTW

🎪 I'm extremely excited to announce the release of version 0.3 of Barnum! Barnum is now a programming language for asynchronous and parallel computation whose goal is to make it extremely easy for you to orchestrate your agents! So why use a programming language for this? Why not just use plan mode/a markdown file for the complicated cases? Well, LLMs are incredibly powerful tools, but they certainly aren't reliable. If an LLM is in charge, you risk it changing its mind and implementing something else, or disabling unit tests and generally cutting corners. (Very relatable, to be honest.) And furthermore, it is hard to accurately express complicated workflows with loops and conditionals in prose. The answer is to use a workflow engine. Barnum is a workflow engine masquerading as a programming language. When a workflow engine is in charge, your LLMs can't wriggle out of requirements, and it's easier to accurately describe the actual, complicated workflow. And it's this increase in reliability that allows you to build bigger, more impactful agentic workflows. Already, I've used Barnum to ship hundreds of PRs. Other folks have used it to push forward on automated migrations, remove dead code, implement a RAG search pipeline, and validate all of the statements in publicly facing documentation. I hope you give it a try! pnpm install @barnum/barnum But read on for more cool details...

English
0
0
0
389