girish

556 posts

girish banner
girish

girish

@googrish

@castformai prev @scifivc @stanfordsymsys

Tham gia Ekim 2012
320 Đang theo dõi1.7K Người theo dõi
Tweet ghim
girish
girish@googrish·
“don’t train your own model” is common ai advice. it's wrong. your token bill's the proof. today, we’re excited to launch castform into open preview. castform is the easiest way for you to train your own model, on your own data. open-weights models are performant and much cheaper. when trained on your task & proprietary data, they beat closed models. the thing standing between you and that was weeks of plumbing & years of ml expertise. with castform, model training is as simple as prompt engineering. @castformai bring your agent traces or raw corpora. castform turns it into training data, picks the right algorithmic recipes, manages gpus, and gives you an ide to watch and chat with your model as it learns. see what you can build with castform👇
English
228
236
2.6K
442.3K
girish
girish@googrish·
@matt_slotnick but isn't the point of opensource that "serving" can be distributed? instead of a few players trying to serve for hyperscale, you get lots of folks serving models for narrower segments
English
1
0
0
195
Matt Slotnick
Matt Slotnick@matt_slotnick·
i’d bet a nontrivial amount of the frontier lab value prop is actually in the infrastructure to serve intelligence reliably and at scale. people expecting open source to massively cannibalize frontier spend are going to be waiting a while. massive scale infrastructure is hard
English
5
1
30
5K
Samian
Samian@ApplyWiseAi·
@googrish ok the die rolling thing is a genuinely underrated eval. clean toy problem with obvious ground truth. curious what your reward signal looked like.. straight KL on uniform or something sneakier?
English
1
0
1
32
girish
girish@googrish·
lots of talk about agi, asi, rsi but ask any frontier LLM to roll a die and it will almost always say "4." claude, gpt, kimi - doesn't matter, 4.4.4.4. so here's how i post-trained a model to reliably roll a die (i.e. each number ~1/6th of the time) & why it's a nice sandbox for one of the most interesting problems in rl i.e. getting a model to actually explore instead of just following strategies it already knows 🧵
girish tweet media
English
2
5
16
768
girish
girish@googrish·
two open questions remain: clustering: exact match works for dice, but how do you identify genuinely novel reasoning paths in long agentic traces? what if the right answer is far from base capabilities? reward shaping still needs the model to stumble there first.
English
1
0
0
59
girish
girish@googrish·
cool application of llm post-training: improving the "emotional intelligence" of a language model most other emotional intelligence evals tend to just check if a model can indicate what a person's feeling. this one's cooler -> checks if a model can actually change a person's mood across a multi-turn conversation. specifically, another llm simulates a person and after each reply outputs two numbers: how upset they are and how much they trust you. the cool thing is that these 2 numbers are basically a reward signal that you can use to RL a model, which they did to much success.
girish tweet media
English
2
2
20
841
girish
girish@googrish·
@_TarunKathuria thanks so much for detailed response sir :) i've never played around with jax myself so gonna go check it out (esp. given ur point on logprob inconsistency)
English
0
0
1
104
Tarun Kathuria
Tarun Kathuria@_TarunKathuria·
I think the main advantage you get is state and sharding being explicit along with compilation boundaries being enforced. This for instance is good because it's always easy to know where policy params, KV caches (which frequently need to be flushed and can be a source of bottlenecks to be identified) and rollouts are located. Another big one is async dispatch. This is actually quite useful imo - you want the enqueueing generation, reward scoring, logprob (re)comp to all happen while keeping host coordination easy to manage so for instance the enqueueing TPUs can run ahead of the actual execution. Also like here's some failure modes that are top of mind for me: like using stale weights, or logprob inconsistency or generation throughput dropping randomly for reasons that are annoying to debug (which people add all sorts of hacks to fix for imo) but you can just enforce certain invariants in jax easily to make this less likely or at least easier to debug and isolate? Finally the mental model for me is cleaner: define the model(s) and the sampler, how the sharding checkpoints is happening and the rollout buffering and then a standard coordinator. In torch, at least it seems you need to have a lot more customized stuff in terms of rollout logic, how the logging and token packing is happening. I kind of want to also add that this is just me trying to learn this myself and maybe there are best practices in torch I've missed or maybe I just like thinking in functional paradigm to be more natural given my background and certainly not that jax is fundamentally superior to torch and you can mess up both ways. Just that certain things seem easier/intuitive to me in jax than in torch. But there are still many challenges in async rl that seem to be of similar complexity in both.
English
1
1
9
950
Tarun Kathuria
Tarun Kathuria@_TarunKathuria·
My hot take is that async RL is far easier and more robust to do in JAX than Torch (as someone who’s been trying to practice in both as of late) and that people should build more solid open source RL infrastructure in JAX (even if your base model checkpoints are originally torch tensors)
English
10
6
153
20K
girish
girish@googrish·
very well-written piece. "Private reinforcement learning environments should let models grow stronger on real traces from inside the organization. Its knowledge base makes institutional memory queryable and use of tokens more efficient. This loop becomes the new IP of the firm. I think of it as a hill climbing machine. And unlike most assets, it compounds."
Satya Nadella@satyanadella

x.com/i/article/2065…

English
0
4
33
4.9K
girish
girish@googrish·
@bonlee__ @sparab22 give it a go! exporting the starter template and letting claude code go wild has worked surprisingly well - let me know if you need help with setup :)
English
1
0
1
56
girish
girish@googrish·
“don’t train your own model” is common ai advice. it's wrong. your token bill's the proof. today, we’re excited to launch castform into open preview. castform is the easiest way for you to train your own model, on your own data. open-weights models are performant and much cheaper. when trained on your task & proprietary data, they beat closed models. the thing standing between you and that was weeks of plumbing & years of ml expertise. with castform, model training is as simple as prompt engineering. @castformai bring your agent traces or raw corpora. castform turns it into training data, picks the right algorithmic recipes, manages gpus, and gives you an ide to watch and chat with your model as it learns. see what you can build with castform👇
English
228
236
2.6K
442.3K
girish
girish@googrish·
with the events around fable, it’s clear that companies & developers need to own their models. the ability to post-train & rl models must become a more broadly accessible skill. rl fine tuning sounds like rocket science, but it really isn’t. so @Thariq_q (not the claude code guy :D) made a video that explains it with as little jargon as possible👇
English
5
5
28
1.9K