Jonathan Chang

2.5K posts

Jonathan Chang

@ChangJonathanC

ML/AI Engineer, building https://t.co/uEbfxzEztO

Taiwan 가입일 Mayıs 2020

954 팔로잉1.6K 팔로워

고정된 트윗

Jonathan Chang@ChangJonathanC·7 Ağu

while we wait for gpt-5 to drop. Here is a flex attention tutorial for building a < 1000 LoC vllm from scratch jonathanc.net/blog/vllm-flex…

English

413

79.3K

Jonathan Chang@ChangJonathanC·7h

@agraybee let's ban emails too!

English

240

Everything Price Sufferer (but especially eggs)@agraybee·18h

You guys would have outlawed refrigerators to keep milkmen in business.

More Perfect Union@MorePerfectUS

NEW: If Waymo gets its way, 2 million workers will be out of work. When Waymo gets a firm hold on a city, wages go down. Some drivers now have to work 12 hours day, 7 days a week just to get by. This isn't inevitable — but Big Tech is spending millions to make you think it is.

English

504

9.5K

245.2K

Jonathan Chang@ChangJonathanC·1d

@kalomaze @shakoistsLog __setattr__

Norsk

kalomaze@kalomaze·1d

@shakoistsLog getattr

English

2.3K

shako@shakoistsLog·1d

coding models love 'isinstance' so much. i tend to think if you're using 'isinstance' though your system is janky.

English

116

12K

Jonathan Chang@ChangJonathanC·1d

@bigeagle_xd knowledge distillation works better with a weaker teacher model, you can try that too

English

109

熊师傅 weight decay 了吗@bigeagle_xd·1d

最近在滑雪机上学单板，教练会用语言教各种情况下该怎么做动作，但是他说的“踩”、“拧”、“释放压力”和我理解的显然不是一个东西，总之我就是很难主动控制雪板而且非常吃力。后来我干脆放弃了，让雪板随机游走，我只控制自己别摔倒，同时尽可能记下雪板的运动状态和当时自己的身体状态，记到一定程度之后，如果想要主动控制，就从记忆里找有没有match的场景，如果有的话就尽力replay一下，如果恰好控制住了，这就成了一个正样本，如果没有，就放弃控制让它继续随机运动。我发现这种学习方式非常省力且效率很高。仔细一想，这个套路很像：先随机采样学习先验分布，再强化学习提高目标分布的概率，果然还是 pretrain + RL 效率高。另一方面，教练有 knowledge curse，他确实很难再体会到“完全不会滑雪”是什么感觉了，所以给了好多我学不会的SFT数据🥲

中文

2.8K

Jonathan Chang@ChangJonathanC·1d

@shiraeis imo codex is pretty option-preserving: it doesn't do more than what you ask for and leaves the option open for the user to decide (if you want, i can ... )

English

400

shira@shiraeis·1d

Found a paper that suggests we may have spent years training agents to become hunters of proxy reward when the more basic thing intelligence craves is not a reward at all, but to not run out of viable futures. The paper proposes that behavior is best understood as maximizing future action-state path occupancy, which collapses mathematically into a discounted entropy objective. The agent doesn’t necessarily want to GET something, but rather is trying to keep as many meaningful trajectories alive as possible. The obvious objection is “so it just does random shit? fuck around and find out?” No, this is where it gets pretty beautiful. The agent is variable when variation is cheap and becomes surgically goal-oriented the moment an absorbing state (death, starvation, falling over, etc) gets close enough to threaten its future path space. Variability is the same drive as goal-directedness, just operating under different constraints. The demos are kinda wild: - A cartpole (classic move a cart to keep a pole from falling control task) that doesn’t merely balance but dances and swings through a huge range of angles and positions because why not? The whole point is occupying state space, and rigid balance is a voluntarily impoverished life. - A prey-predator gridworld where the mouse PLAYS with the cat, teasing it and using both clockwise and counterclockwise routes around obstacles to lure it away from the food source before slipping in to eat, using both routes roughly equally. A reward-maximizing agent would collapse to one strategy and exploit it. Here, the agent keeps its behavioral repertoire - A quadruped trained with Soft Actor-Critic and ZERO external reward that learns to walk, jump, spin, and stabilize, and then makes a beeline for food only when its internal energy drops low enough that starvation becomes a real threat The thing that hit me hardest is the comparison to empowerment and free energy principle agents. Both collapse to near-deterministic policies with almost no behavioral variability. This paper’s agents find the highest-empowerment state and exploit it. FEP agents converge to classical reward maximizers. As far as I’m aware, this is the only framework that produces agents you could describe as being “alive.” The AI implication here is that we undertrain for behavioral repertoire. Most systems hit the benchmark by collapsing onto a narrow attractor basin of good-enough trajectories. They’re competent for sure, but brittle too, with one viable plan, executed until the world shifts and leaves them with nothing. The thing I increasingly want from agents isn’t competence per se, but option-preserving competence. I want agents with the ability to keep multiple viable plans alive and switch between them without catastrophe. We’ve been so focused on teaching agents what to want that we never stopped to ask what happens if wanting isn’t the point, if the deepest drive isn’t necessarily toward anything, but away from the walls closing in. paper: nature.com/articles/s4146…

English

122

61.7K

Jonathan Chang@ChangJonathanC·1d

$NET feel 10% after a new Claude release

English

149

Jonathan Chang@ChangJonathanC·1d

> Following a successful alignment review, the first early version of Claude Mythos Preview was made available for internal use on February 24.

Jonathan Chang@ChangJonathanC

@secemp9 @SIGKITTEN @AnthropicAI The freeze is so bad that I couldn't even use ctrl+c to kill it

English

143

Jonathan Chang@ChangJonathanC·1d

@giffmana the other day I asked it to adjust my fan speed. At first the agent said it might not be possible to do so, and then I asked it to try to find a way do it. voila

English

111

Lucas Beyer (bl16)@giffmana·1d

Coding agents are such game-changers for linux. For almost anything that doesn't work, in the past I would have spent the afternoon, or even whole weekend, scourging forums, trying many many things, before fixing it or giving up. Now I just point codex and claude it at (and, crossing fingers, soon our model :). Latest example: Today, I wanted to install ibkr's desktop app, a java-based monstrosity. I'm using wayland/sway on hidpi, and these two things don't go together well: fonts are awfully pixelated, and ui scale completely wrong. ChatGPT gave up after I told it two ideas it suggested didn't work (some scaling-related env vars). Muse Spark had a nice new idea of patching its java Qt libraries, but it had outdated paths left and right. I then copy-pasted Muse Spark's idea into codex, telling codex to give it a shot and adjust as needed. And codex went ahead and did it, fetching the files out of the older versions of arch packages via curl, to patch the ibkr's built-in ones. And it works! Now I'm enjoying the high-resolution version of the program without having to wait for months and months until the developer fixes this rare corner-case!

English

580

55.6K

Jonathan Chang@ChangJonathanC·1d

@DimitrisPapail try to use older version of Claude code?

English

187

Dimitris Papailiopoulos@DimitrisPapail·1d

I’ve had incredibly frustrating sessions with Claude Code the past two weeks. I set effort to max yet it’s extremely sloppy ignores instructions and repeats mistakes. I don’t understand what’s going on

Yuchen Jin@Yuchenj_UW

Seeing rumors that Claude Opus 4.6 got nerfed. Usually this boils down to 3 cases: - Unintentional. For example, a regression caused by changes in the inference stack or Claude Code. This is what evals are for before rolling out. - Intentional “optimizations” (quantization, reduced reasoning). If so, say it. If users pay for a model, they should get that model. - User psychology. The more you use a model, the dumber it feels.

English

427

50.4K

Jonathan Chang@ChangJonathanC·2d

@irl_danB this is the fork() and goto() tools in llmproc

English

dan@irl_danB·3d

this was an RLM too actually

dan@irl_danB

context window won’t be “solved” as long as attention is quadratic and presumably Suhail is thinking about the compaction problem as it occurs in long running agents like claude code but this is downstream from an architectural problem with standard agent implementations (claude code among them) that use a linear “chat-like” history we all work through coding tasks linearly, but any seasoned software engineer’s mental model of their progress looks more like a call stack: pushing tasks on and popping them off when complete when the claude code harness organizes the context more like a call stack (think flame graph) than a linear chat log, compaction will not even be necessary in many cases and less lossy in the cases where it is for the familiar, think: loom

English

7.5K

Jonathan Chang@ChangJonathanC·3d

@alexandr_wang please change the app icon bro

English

850

Alexandr Wang@alexandr_wang·3d

up to #3, coming for the crown 👑 that being said, MONOPOLY GO!Chat is now #1, so i’m learning a lot about the App Store

English

132

1.6K

178.8K

Jonathan Chang@ChangJonathanC·3d

@theo @JoshRadDev this reasoning budget being in the system prompt thing has existed since sonnet 3.7 you can talk to that model and see it yourself the doc also says changing it invalidates the prompt cache

English

159

Theo - t3.gg@theo·3d

@JoshRadDev Any evidence of this? The leaked system prompts don't include any information about this

English

12.2K

Theo - t3.gg@theo·3d

nvm I was wrong. Repro'd this 3 times in a row. I need to stop assuming Anthropic is competent. Burns me every time I do 🙃

Theo - t3.gg@theo

Fun fact: LLMs have zero idea how they are configured. They don't know what GPUs they're running on. They don't know what temperature or reasoning level they have set. They don't know if they've been quantized or not. They're just doing next-token prediction. As always.

English

1.3K

222.8K

Jonathan Chang@ChangJonathanC·3d

the real programs are the prompts we write alone the way

English

Jonathan Chang@ChangJonathanC·4d

@johannesack cool!

English

Johannes Ackermann@johannesack·4d

@ChangJonathanC We've got an ES environment for designing RL environments: x.com/JarekLiesen/st…

Jarek Liesen@JarekLiesen

🤖 RL agents are trained and evaluated in the same env. What performance gains could we achieve when training in a meta-learned synthetic env instead? 🌍 Excited to share our paper: "Discovering Minimal RL Environments" 📝 arxiv.org/abs/2406.12589 💻 github.com/keraJLi/synthe…

English

370

Jonathan Chang@ChangJonathanC·5d

RL environment for designing RL environment is anyone working on this?

English

106

11.7K

Jonathan Chang@ChangJonathanC·4d

You can configure this setting globally. Just ask any ai how to do it

Jacob Bartlett@jacobtechtavern

Trying to open a markdown file on MacOS

English

338

Jonathan Chang@ChangJonathanC·4d

@tszzl @karpathy @soumitrashukla9

QME

550

roon@tszzl·4d

@karpathy @soumitrashukla9 non technical people are downloading something called openclaw and using it in their terminal?

English

539

25.4K

Andrej Karpathy@karpathy·4d

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

staysaasy@staysaasy

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

English

2.4K

19.8K

Jonathan Chang@ChangJonathanC·4d

@johannes_hage sick

English

118