Jack Friedson

125 posts

Jack Friedson

@JackFriedson

building something new · prev infra/product eng @haizelabs, applied AI @datadog

New York Katılım Kasım 2023

481 Takip Edilen81 Takipçiler

Jack Friedson@JackFriedson·21h

the worst part is that if you let even like 10% of the slop slide then your agent suddenly starts treating it like a deliberate pattern and reproducing it all over the codebase

Jack Friedson@JackFriedson

another day of wondering whether the efficiency gains from coding agents really justify the amount of time I spend undoing their bullshit design choices

English

Jack Friedson@JackFriedson·1d

another day of wondering whether the efficiency gains from coding agents really justify the amount of time I spend undoing their bullshit design choices

English

184

Jack Friedson@JackFriedson·1d

one must imagine sisyphus happy

English

Jack Friedson@JackFriedson·2d

listen I'm not gonna say you shouldn't dunk on someone in a prototypical leopard face eating situation, but don't pretend that doing so is somehow constructive. you're doing it because it feels good. that's it. end of story.

William B. Fuckley@opinonhaver

A million people have rightly dunked on this guy, & I don’t care, I’m going to do it too, bc these people should have their catastrophic and massively consequential failures in judgement shoved in their faces forever. Sometimes a dog doesn’t learn unless you rub it’s face in it.

English

Jack Friedson@JackFriedson·3d

really don't get what all the hype is about, like this is supposed to be hard? I even got one with colors

English

Jack Friedson@JackFriedson·3d

@willccbb

QME

498

will brown@willccbb·3d

need this as a skill

Anjney Midha@AnjneyMidha

if you run an ai lab, pls ensure your team has read this before putting any charts out into the world

English

489

42.3K

Jack Friedson retweetledi

Pay Roll Manager Here@UsingLyft·4d

He’s typing in a search bar, quick show him the search option he’s looking for. Perfect. He typed the next letter that is also the next letter in the option we just showed him so take that option away and show him an option that doesn’t match at all

English

155

1.8K

34.5K

521.7K

Jack Friedson@JackFriedson·3d

all my homies hate scalar rewards

Ryan Bahlous-Boldi@RyanBoldi

Your RL post-training may be sabotaging your LLM’s test-time scaling! Conventional RL pretends that you can collapse all reward signals *upfront* into a single *scalar reward*. We introduce Vector Policy Optimization (VPO), which natively maximizes *vector-valued* rewards, boosting test time search performance, even on the original scalar.

English

106

Jack Friedson@JackFriedson·5d

@trashpandaemoji @pierrecomputer yes but we’re building GOOD ones

English

Trash Panda 🦝@trashpandaemoji·5d

@JackFriedson @pierrecomputer lmao, i think everyone and their mother is building a harness right now

English

Trash Panda 🦝@trashpandaemoji·5d

I'm quite pleased with how Neon Pilot's chat interface is coming out. Some neat features: - Uses @pierrecomputer's diffs everywhere (amazing library) - Tries to keep everything in the chat transcript inspect-able, including the system prompt. A lot of harnesses try to hide things. I believe should be able to inspect everything. - Tool calls get hidden in a shelf on every turn, except for tools that have things you might want to see (like diffs). These are pinned under the tool shelf. You can still open it up and see everything. - Tool calls stream in and you can see the tool output but it collapses once its done. You get the sensation stuff is happening, but its all tucked away once the agent is done running so you can actually focus on the agent output and your original request.

English

576

Jack Friedson@JackFriedson·5d

@trashpandaemoji @pierrecomputer dawg we are literally building the same thing. I mean yours is "better" and has more "actual features", but they're ontologically equivalent

English

Trash Panda 🦝@trashpandaemoji·5d

@pierrecomputer Assistant output. Also notice you can fork or rewind to any user/assistant message. I really like being able to fork into a new conversation, use this feature a lot.

English

Jack Friedson@JackFriedson·19 May

"You're right to push back" YES I KNOW

English

Jack Friedson@JackFriedson·17 May

Rip David Lynch you would've loved this

English

Jack Friedson@JackFriedson·15 May

"Finding the right footing at the very beginning matters more than ever. In a world where it’s cheap to build almost anything, the real edge is choosing what is actually worth building and staying with it long enough to learn something the market doesn’t know yet."

Aditya Agarwal@adityaag

x.com/i/article/2054…

English

Jack Friedson@JackFriedson·15 May

codex: I did the refactor but kept the previous interface as a shim to avoid import churn me: I will shoot you with a gun

English

Jack Friedson retweetledi

Lakshya A Agrawal@LakshyAAAgrawal·13 May

Learning from rich textual feedback (errors, traces, partial reasoning) beats scalar reward alone for LLM optimization. GEPA demonstrated this for context-space optimization (prompts and agent harnesses), delivering frontier results at a fraction of the cost of RL. But context-only optimization is bounded by the base model's capability ceiling; weight updates can reach further. Very excited about this new line of work on Fast-Slow Training (FST), which interleaves context and model weight optimization! The idea is a clean division of labor between two interleaved loops: 🔹 Fast loop (context): GEPA reads rich rollout feedback updating the context layer. The context becomes a fast-updating scratchpad of what the model needs to know about this task, right now. 🔹 Slow loop (model parameters): RL updates the model's parameters conditioned on the evolving context. Because the prompt already carries task-specific nuances, the model parameters are freed from absorbing them and focus on what actually generalizes across tasks and pushes the frontier. ⦁ 3× more sample-efficient than RL on math, code, and physics reasoning ⦁ ~70% lower KL divergence from base at matched accuracy ⦁ Plasticity preserved: FST checkpoints respond better to additional RL on new tasks than RL-only ones ⦁ Continual learning across changing tasks (HoVer → CodeIO → Physics) where RL stalls the moment the task switches FST is a direction towards: ⦁ Addressing RL's pain points: entropy collapse, sparse rewards, long-horizon exploration ⦁ Providing a clean channel for rich feedback into weight updates ⦁ Demonstrating model-harness co-evolution ⦁ Discovery: Using fast context updates for broad exploration, while leveraging a continually improving model. Check out the full thread below:

Kusha Sareen@KushaSareen

Can LLMs adapt continually without losing base skills? Fast-Slow Training (FST) pairs "slow" weights with "fast" context. FST vs. RL: • 3x more sample-efficient • Higher performance ceiling • Less KL drift (better plasticity) • Continual learning: succeeds where RL stalls

English

186

33.1K

Jack Friedson@JackFriedson·14 May

Great example of why you should 1. Do the tiniest bit of diligence to check if the tweet you're quoting to shill your product is in fact a real example of the problem your product solves or if it's just an elaborate joke 2. That's it, just the first one

Brace@BraceSproul

Great example of why you should 1. Run your agent on a separate machine from the sandbox it uses (e.g. sandbox as a tool) 2. Never set env vars in your sandbox. Instead, use something like LangSmith’s sandbox proxy auth (reqs are intercepted as they leave the sandbox and secrets are injected, that way the secret never enters the sandbox)

English

Jack Friedson@JackFriedson·13 May

as someone who is (boldly, courageously) building the exact same software factory app as everyone else on this website, I tend to agree but I think there's some nuance. namely that building a bespoke harness can teach you how to use existing ones much more effectively. like if I wrote a custom rdbms I probably wouldn't use it in production, but I'd get a hell of a lot better at using postgres but yeah if the thing you're optimizing for is doing your current work faster or hitting RSI escape velocity before big labs, then building a bespoke harness is not going to get you there (the only notable exception to this is, of course, me, who is special and different)

English

Tenobrus@tenobrus·13 May

this doesn't feel rational to me at all. ime gains from "agentic coding techniques" or scaffolding or orchestrators are *incredibly diminishing*, whatever u settle on the first week is ~as good as its gonna get. and the actual productivity increases come from new model releases, plus first party or well funded third party scaffoldings often timed pretty close to those releases. i think the rational thing to do is mostly to just actually work on on shit with the tools you have, not try to build your own half functioning software factory that doesn't actually make you much more productive, and after two more model iterations just start using whatever the standard swarm scaffolds are that are actually battle-tested. people who spent the last three months making their own hypertuned custom openclaw setups to respond to every inbound email with a PR that gets shipped to prod didn't "build a factory", they kinda just wasted their time

Justin Murphy@jmrphy

The big problem with getting any kind of substantive work done right now is that it feels so much more rational to work on the factory that produces the substantive work, instead of the work itself. It’s like the woodsman who was once asked: “What would you do if you had just five minutes to chop down a tree?" The woodsman: "I would spend the first two and a half minutes sharpening my axe.” Well, what if you only had 30 years left to achieve greatness? "You should probably spend 15 years building the most effectively calibrated and defensible, recursively self-improving systems for the production of great work.” AI psychosis or a perfectly rational strategy?

English

255

16.1K

Jack Friedson@JackFriedson·12 May

Cathie Wood is the Jim Cramer of investing

 Q-Cap @qcapital2020

How tf is this even possible

English

Jack Friedson@JackFriedson·12 May

Jack Friedson@JackFriedson

big downside of agentic coding is that it's much harder to fight shiny object syndrome because you can do an enormous rewrite in like an hour

ZXX

Keşfet

@willccbb @trashpandaemoji @pierrecomputer @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates