Robert Balicki (👀 @IsographLabs) (@StatisticsFTW) - Twitter Profili

Sabitlenmiş Tweet

Robert Balicki (👀 @IsographLabs)@StatisticsFTW·4 Ağu

LFG! That's some nice and intuitive DevEx in @isographlabs

Robert Balicki (👀 @IsographLabs) tweet media

English

2

1

14

5K

Robert Balicki (👀 @IsographLabs)@StatisticsFTW·1h

@rachelnabors 1

0

6

R 'Nearest' Nabors@rachelnabors·3h

Which would you log into to keep up with your socials and news?

English

5

0

172

Robert Balicki (👀 @IsographLabs)@StatisticsFTW·1h

missed opportunity to call it a dyeing industry mundane.beauty/p/how-britain-…

English

0

21

Robert Balicki (👀 @IsographLabs)@StatisticsFTW·3h

"p" and "o" are very near each other on the keyboard, yet "np" and "no" have very different meanings when responding to a text

English

0

58

Robert Balicki (👀 @IsographLabs)@StatisticsFTW·8h

@patrickc Domo sacer

Español

0

1

0

121

Patrick Collison@patrickc·9h

I think this reflects our confused spiritual relationship with our own past. We feel a strong and intuitive affinity for what we used to build, while our contemporary frameworks consider such buildings misguided and "immoral". The mind says massing prohibitions and setbacks; the heart says beautiful masonry, higgeldy-piggeldy layouts, and Victorians. Rather than resolve the tension, we designate large fractions of our city ontologically confused, simultaneously protected and illegal; sanctified and condemned.

Patrick Collison@patrickc

I'm interested in "trapped buildings": those that couldn't be built today (because of zoning and code changes) but also can't be substantially modified or demolished (because of historic protection rules). One of those phenomena that really makes one wonder what exactly we're trying to do. Has anyone ever estimated what fraction of buildings in major cities fall into this category? When I asked Claude about San Francisco, it concluded: "If forced to give a single number with a single confidence rating: roughly 100,000 buildings — about two-thirds of San Francisco's physical structures — sit in the trap as a practical matter. Confidence: moderate. The number could be 70,000 or 130,000 depending on how strictly you operationalize "can't be substantially modified.""

English

49

59

952

126.4K

Robert Balicki (👀 @IsographLabs)@StatisticsFTW·1d

I think there's also a case to be made that the models need input to provide good output. If you just say "write me a poem", it will write generic pap. Only if provide paragraphs about what you actually want (and know what you want!) will it provide great output. Coding is similar. The requirements for an agent doing code review vary widely! In some cases, you want to match what the codebase has, because it has good examples. In other cases, you want to improve upon it. In yet others, the code is throwaway and can be worse. A single, generic "review this code" instruction will fail to distinguish between those three. So the first point is that you need to provide lots of detailed instructions. The second point is that most folks are too lazy to do that, or do not know how to steer the model, or do not see the benefit to doing this, or do not benefit from doing this in their use case. IMO that's likely to be the 95% use case. And the third point is that the massive labs need to focus on the 95% use case. So, the question becomes — will they provide a harness that adequately handles the 95% use case while also handling the leet correct-by-construction case? Maybe! The labs need correct-by-construction code internally, so they're motivated to do it. But it's not as slam dunk of a case as OP makes it seem. So, I think it's likely that the market will sustain a better harness for those that care more about correctness, etc. than the average programmer.

English

0

50

Malte Ubl@cramforce·1d

What Ryan says is false today. It could be true in the future (and Ryan could know that this future will exist) if the models keep the harness itself fully private. Because as long as the harness is public (codex is open source, claude can be easily reverse engineered using codex), any custom harness can use quasi-identical tools (etc) and differentiate at a layer orthogonal to the post-training target. Almost all the differentiation of a harness is already in the latter category. But even if the harnesses go fully private (which would mean making them in-cloud APIs): Just like abstractions were added on top of the models, abstractions will be added on top of the harness. The model labs might be the ones who make the best harness for their model but they will not make the abstraction that you use to use their harness.

Ryan Lopopolo@_lopopolo

While alternative coding harnesses may have short term lift, they will be bitter lesson’d away. I am bearish on any harness that doesn’t come from the lab whose model you are using. You’re fighting against post-training. To put a finer point on this, you know how like, ioctls are like “huh that's weird but I guess whatever it's what we've got we can work with that”? It is exact the same with like, the particular JSON construction the Codex shell tool uses. The model used to mangle nested quotes in this monstrosity RPC all the time but now it does not and it does not matter that the API is bad because billions of failed invocations are used to train to the harness we have, not the harness we deserve.

English

17

5

154

28.6K

Robert Balicki (👀 @IsographLabs)@StatisticsFTW·1d

Gemini fast so much worse than pro. Basically unusable by comparison

English

0

2

155

Robert Balicki (👀 @IsographLabs) retweetledi

Nadeem Bitar@shinzui·1d

x.com/i/article/2050…

ZXX

5

11

199

25.6K

Robert Balicki (👀 @IsographLabs)@StatisticsFTW·1d

Barnum solves this! Don't give your agent any power to push. Handle that in a programming language

Rylan Schaeffer@RylanSchaeffer

● I need to flag something serious: the merge commit was pushed to origin/main despite my instruction not to push. - Claude 4.7 Max Thinking There's something fundamentally wrong with these models. Clean context + specific instructions, but they still fuck up in simple ways

English

0

117

Robert Balicki (👀 @IsographLabs)@StatisticsFTW·2d

betterdisplay.pro where has this been my whole life

English

1

0

3

328

Robert Balicki (👀 @IsographLabs)@StatisticsFTW·2d

Is it truly possible that you can't update your address in @PayPal without removing or deassociating all the cards? This is such bad UI

English

0

1

164

Robert Balicki (👀 @IsographLabs)@StatisticsFTW·2d

@RhysSullivan chromewebstore.google.com/detail/remove-… and chromewebstore.google.com/detail/hide-yo… help

English

0

56

Rhys@RhysSullivan·2d

youtube has completely ruined their homepage with shortsslop

English

55

1

282

25.9K

Robert Balicki (👀 @IsographLabs)@StatisticsFTW·2d

@atmoio Frameworks like @isographlabs, which allow you to reason locally, will substantially cheapen and improve the quality of LLM-based code review youtube.com/watch?v=3GWZ9y…

YouTube

English

0

1

94

Mo@atmoio·2d

the future of software engineering seems uncontroversially prompting + code review. startups will skip the code review because they’re racing against time. larger/serious orgs will take code review very seriously. llms can do code review, but my guess is that because they have to search through large space, it will be as expensive to have say mythos review your code as it would be to have a senior dev. based on budget: $: prompting only $$: low grade llm review $$$: mid grade llm + dev review $$$$: high grade llm + sr dev review btw, software (past the bootstrapping phase) will get more expensive to make and take more time. quality will remain exactly the same as when humans were doing it: shit.

Zack Korman@ZackKorman

Mandatory human-in-the-loop is a cybersecurity cop-out. People are giving agents more and more autonomy. We need solutions that accept that world because there is no stopping it. It's like telling people in the 90s to not use the internet to avoid getting hacked. Good luck.

English

98

43

626

71.8K

Robert Balicki (👀 @IsographLabs) retweetledi