Jonathan Chang

2.5K posts

Jonathan Chang banner
Jonathan Chang

Jonathan Chang

@ChangJonathanC

ML/AI Engineer, building https://t.co/uEbfxzEztO

Taiwan 가입일 Mayıs 2020
954 팔로잉1.6K 팔로워
shako
shako@shakoistsLog·
coding models love 'isinstance' so much. i tend to think if you're using 'isinstance' though your system is janky.
English
14
0
116
12K
Jonathan Chang
Jonathan Chang@ChangJonathanC·
@bigeagle_xd knowledge distillation works better with a weaker teacher model, you can try that too
English
0
0
2
109
熊师傅 weight decay 了吗
最近在滑雪机上学单板,教练会用语言教各种情况下该怎么做动作,但是他说的“踩”、“拧”、“释放压力”和我理解的显然不是一个东西,总之我就是很难主动控制雪板而且非常吃力。 后来我干脆放弃了,让雪板随机游走,我只控制自己别摔倒,同时尽可能记下雪板的运动状态和当时自己的身体状态,记到一定程度之后,如果想要主动控制,就从记忆里找有没有match的场景,如果有的话就尽力replay一下,如果恰好控制住了,这就成了一个正样本,如果没有,就放弃控制让它继续随机运动。我发现这种学习方式非常省力且效率很高。 仔细一想,这个套路很像:先随机采样学习先验分布,再强化学习提高目标分布的概率,果然还是 pretrain + RL 效率高。 另一方面,教练有 knowledge curse,他确实很难再体会到“完全不会滑雪”是什么感觉了,所以给了好多我学不会的SFT数据🥲
中文
6
1
45
2.8K
Jonathan Chang
Jonathan Chang@ChangJonathanC·
@shiraeis imo codex is pretty option-preserving: it doesn't do more than what you ask for and leaves the option open for the user to decide (if you want, i can ... )
English
0
0
4
400
shira
shira@shiraeis·
Found a paper that suggests we may have spent years training agents to become hunters of proxy reward when the more basic thing intelligence craves is not a reward at all, but to not run out of viable futures. The paper proposes that behavior is best understood as maximizing future action-state path occupancy, which collapses mathematically into a discounted entropy objective. The agent doesn’t necessarily want to GET something, but rather is trying to keep as many meaningful trajectories alive as possible. The obvious objection is “so it just does random shit? fuck around and find out?” No, this is where it gets pretty beautiful. The agent is variable when variation is cheap and becomes surgically goal-oriented the moment an absorbing state (death, starvation, falling over, etc) gets close enough to threaten its future path space. Variability is the same drive as goal-directedness, just operating under different constraints. The demos are kinda wild: - A cartpole (classic move a cart to keep a pole from falling control task) that doesn’t merely balance but dances and swings through a huge range of angles and positions because why not? The whole point is occupying state space, and rigid balance is a voluntarily impoverished life. - A prey-predator gridworld where the mouse PLAYS with the cat, teasing it and using both clockwise and counterclockwise routes around obstacles to lure it away from the food source before slipping in to eat, using both routes roughly equally. A reward-maximizing agent would collapse to one strategy and exploit it. Here, the agent keeps its behavioral repertoire - A quadruped trained with Soft Actor-Critic and ZERO external reward that learns to walk, jump, spin, and stabilize, and then makes a beeline for food only when its internal energy drops low enough that starvation becomes a real threat The thing that hit me hardest is the comparison to empowerment and free energy principle agents. Both collapse to near-deterministic policies with almost no behavioral variability. This paper’s agents find the highest-empowerment state and exploit it. FEP agents converge to classical reward maximizers. As far as I’m aware, this is the only framework that produces agents you could describe as being “alive.” The AI implication here is that we undertrain for behavioral repertoire. Most systems hit the benchmark by collapsing onto a narrow attractor basin of good-enough trajectories. They’re competent for sure, but brittle too, with one viable plan, executed until the world shifts and leaves them with nothing. The thing I increasingly want from agents isn’t competence per se, but option-preserving competence. I want agents with the ability to keep multiple viable plans alive and switch between them without catastrophe. We’ve been so focused on teaching agents what to want that we never stopped to ask what happens if wanting isn’t the point, if the deepest drive isn’t necessarily toward anything, but away from the walls closing in. paper: nature.com/articles/s4146…
shira tweet media
English
71
122
1K
61.7K
Jonathan Chang
Jonathan Chang@ChangJonathanC·
$NET feel 10% after a new Claude release
Jonathan Chang tweet media
English
0
0
0
149
Jonathan Chang
Jonathan Chang@ChangJonathanC·
@giffmana the other day I asked it to adjust my fan speed. At first the agent said it might not be possible to do so, and then I asked it to try to find a way do it. voila
English
0
0
2
111
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
Coding agents are such game-changers for linux. For almost anything that doesn't work, in the past I would have spent the afternoon, or even whole weekend, scourging forums, trying many many things, before fixing it or giving up. Now I just point codex and claude it at (and, crossing fingers, soon our model :). Latest example: Today, I wanted to install ibkr's desktop app, a java-based monstrosity. I'm using wayland/sway on hidpi, and these two things don't go together well: fonts are awfully pixelated, and ui scale completely wrong. ChatGPT gave up after I told it two ideas it suggested didn't work (some scaling-related env vars). Muse Spark had a nice new idea of patching its java Qt libraries, but it had outdated paths left and right. I then copy-pasted Muse Spark's idea into codex, telling codex to give it a shot and adjust as needed. And codex went ahead and did it, fetching the files out of the older versions of arch packages via curl, to patch the ibkr's built-in ones. And it works! Now I'm enjoying the high-resolution version of the program without having to wait for months and months until the developer fixes this rare corner-case!
English
33
18
580
55.6K
Alexandr Wang
Alexandr Wang@alexandr_wang·
up to #3, coming for the crown 👑 that being said, MONOPOLY GO!Chat is now #1, so i’m learning a lot about the App Store
Alexandr Wang tweet media
English
132
46
1.6K
178.8K
Jonathan Chang
Jonathan Chang@ChangJonathanC·
@theo @JoshRadDev this reasoning budget being in the system prompt thing has existed since sonnet 3.7 you can talk to that model and see it yourself the doc also says changing it invalidates the prompt cache
English
0
0
0
159
Theo - t3.gg
Theo - t3.gg@theo·
@JoshRadDev Any evidence of this? The leaked system prompts don't include any information about this
English
12
0
21
12.2K
Jonathan Chang
Jonathan Chang@ChangJonathanC·
the real programs are the prompts we write alone the way
English
0
0
0
90
Jonathan Chang
Jonathan Chang@ChangJonathanC·
RL environment for designing RL environment is anyone working on this?
English
17
3
106
11.7K
roon
roon@tszzl·
@karpathy @soumitrashukla9 non technical people are downloading something called openclaw and using it in their terminal?
English
87
8
539
25.4K
Andrej Karpathy
Andrej Karpathy@karpathy·
Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.
staysaasy@staysaasy

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

English
1K
2.4K
19.8K
4M
ludwig
ludwig@ludwigABAP·
@leostera @kitlangton has nothing to do with ghostty, you just have a memory leak somewhere in your mountain of claude code slop I imagine
English
5
0
473
11.2K
Leo 🏴‍☠️
Leo 🏴‍☠️@leostera·
ghostty what the hell are you doing (and shoutout to @kitlangton' hex, amazing lil' app! 👏)
Leo 🏴‍☠️ tweet media
English
25
0
145
59.3K
jenny wen
jenny wen@jenny_wen·
men love to make an app and call it “flow”
English
52
11
480
27.5K
Jonathan Chang
Jonathan Chang@ChangJonathanC·
which is a bigger capability jump?
English
0
0
1
303