girish (@googrish) - Hồ sơ Twitter | Zamantika Mersobahis Locabet

Tweet ghim

girish@googrish·6d

“don’t train your own model” is common ai advice. it's wrong. your token bill's the proof. today, we’re excited to launch castform into open preview. castform is the easiest way for you to train your own model, on your own data. open-weights models are performant and much cheaper. when trained on your task & proprietary data, they beat closed models. the thing standing between you and that was weeks of plumbing & years of ml expertise. with castform, model training is as simple as prompt engineering. @castformai bring your agent traces or raw corpora. castform turns it into training data, picks the right algorithmic recipes, manages gpus, and gives you an ide to watch and chat with your model as it learns. see what you can build with castform👇

English

228

236

2.6K

442.3K

girish@googrish·9h

@matt_slotnick but isn't the point of opensource that "serving" can be distributed? instead of a few players trying to serve for hyperscale, you get lots of folks serving models for narrower segments

English

1

0

195

Matt Slotnick@matt_slotnick·16h

i’d bet a nontrivial amount of the frontier lab value prop is actually in the infrastructure to serve intelligence reliably and at scale. people expecting open source to massively cannibalize frontier spend are going to be waiting a while. massive scale infrastructure is hard

English

5

1

30

5K

girish@googrish·20h

@ApplyWiseAi scaled it by frequency :)

English

0

22

Samian@ApplyWiseAi·20h

@googrish ok the die rolling thing is a genuinely underrated eval. clean toy problem with obvious ground truth. curious what your reward signal looked like.. straight KL on uniform or something sneakier?

English

1

0

1

32

girish@googrish·21h

lots of talk about agi, asi, rsi but ask any frontier LLM to roll a die and it will almost always say "4." claude, gpt, kimi - doesn't matter, 4.4.4.4. so here's how i post-trained a model to reliably roll a die (i.e. each number ~1/6th of the time) & why it's a nice sandbox for one of the most interesting problems in rl i.e. getting a model to actually explore instead of just following strategies it already knows 🧵

English

2

5

16

768

girish@googrish·21h

more details in our blogpost: castform.com/blog/rl-divers…

English

0

2

51

girish@googrish·21h

two open questions remain: clustering: exact match works for dice, but how do you identify genuinely novel reasoning paths in long agentic traces? what if the right answer is far from base capabilities? reward shaping still needs the model to stumble there first.

English

1

0

59

girish@googrish·2d

arxiv.org/abs/2606.15532…

ZXX

0

1

276

girish@googrish·2d

cool application of llm post-training: improving the "emotional intelligence" of a language model most other emotional intelligence evals tend to just check if a model can indicate what a person's feeling. this one's cooler -> checks if a model can actually change a person's mood across a multi-turn conversation. specifically, another llm simulates a person and after each reply outputs two numbers: how upset they are and how much they trust you. the cool thing is that these 2 numbers are basically a reward signal that you can use to RL a model, which they did to much success.

English

2

20

841

girish@googrish·2d

@playfuldesignco @castform beautiful!

English

0

13

Playful@playfuldesignco·2d

@googrish Something like this for @castform? share.playfuldesign.app/a269445e-9cb7-…

English

1

0

2

17

girish@googrish·2d

@playfuldesignco generate an event page with a dumbo octopus on it

chloe@barofclo

AI put coding in your pocket. Now it’s design’s turn. Meet Playful, the designer on your phone. Create when inspiration hits, from a party or on the couch. Have an upcoming party / event? We're creating the next 888 event covers for free. RT + tag @playfuldesignco with your request⭐

English

1

0

3

431

girish@googrish·3d

@_TarunKathuria thanks so much for detailed response sir :) i've never played around with jax myself so gonna go check it out (esp. given ur point on logprob inconsistency)

English

0

1

104

Tarun Kathuria@_TarunKathuria·3d

I think the main advantage you get is state and sharding being explicit along with compilation boundaries being enforced. This for instance is good because it's always easy to know where policy params, KV caches (which frequently need to be flushed and can be a source of bottlenecks to be identified) and rollouts are located. Another big one is async dispatch. This is actually quite useful imo - you want the enqueueing generation, reward scoring, logprob (re)comp to all happen while keeping host coordination easy to manage so for instance the enqueueing TPUs can run ahead of the actual execution. Also like here's some failure modes that are top of mind for me: like using stale weights, or logprob inconsistency or generation throughput dropping randomly for reasons that are annoying to debug (which people add all sorts of hacks to fix for imo) but you can just enforce certain invariants in jax easily to make this less likely or at least easier to debug and isolate? Finally the mental model for me is cleaner: define the model(s) and the sampler, how the sharding checkpoints is happening and the rollout buffering and then a standard coordinator. In torch, at least it seems you need to have a lot more customized stuff in terms of rollout logic, how the logging and token packing is happening. I kind of want to also add that this is just me trying to learn this myself and maybe there are best practices in torch I've missed or maybe I just like thinking in functional paradigm to be more natural given my background and certainly not that jax is fundamentally superior to torch and you can mess up both ways. Just that certain things seem easier/intuitive to me in jax than in torch. But there are still many challenges in async rl that seem to be of similar complexity in both.

English

1

9

950

Tarun Kathuria@_TarunKathuria·3d

My hot take is that async RL is far easier and more robust to do in JAX than Torch (as someone who’s been trying to practice in both as of late) and that people should build more solid open source RL infrastructure in JAX (even if your base model checkpoints are originally torch tensors)

English

10

6

153

20K

girish@googrish·3d

very well-written piece. "Private reinforcement learning environments should let models grow stronger on real traces from inside the organization. Its knowledge base makes institutional memory queryable and use of tokens more efficient. This loop becomes the new IP of the firm. I think of it as a hill climbing machine. And unlike most assets, it compounds."

Satya Nadella@satyanadella

x.com/i/article/2065…

English

0

4

33

4.9K

girish@googrish·4d

@iamgunpathe @castformai coming soon!

English

0

1

10

V G Subramanian@iamgunpathe·4d

@googrish @castformai Any video of training model from scratch would be helpful for beginners like me. thanks in advance

English

1

0

1

10

girish@googrish·5d

💯 agree. we are building @castformai to empower any developer in the world to be able to do this

Cohere@cohere

When you rent your artificial intelligence, you have no control, and no choice. This is why sovereignty and ownership matters. Whether it means using your own hardware, open source, or deep customization. Own your AI, own your future.

English

2

14

1.8K

girish@googrish·4d

@bonlee__ @sparab22 give it a go! exporting the starter template and letting claude code go wild has worked surprisingly well - let me know if you need help with setup :)

English

1

0

1

56

bonnie@bonlee__·4d

@googrish @sparab22 Can’t wait to try! Is it suitable for non technical peeps?

English

1

0

1

20

girish@googrish·6d

“don’t train your own model” is common ai advice. it's wrong. your token bill's the proof. today, we’re excited to launch castform into open preview. castform is the easiest way for you to train your own model, on your own data. open-weights models are performant and much cheaper. when trained on your task & proprietary data, they beat closed models. the thing standing between you and that was weeks of plumbing & years of ml expertise. with castform, model training is as simple as prompt engineering. @castformai bring your agent traces or raw corpora. castform turns it into training data, picks the right algorithmic recipes, manages gpus, and gives you an ide to watch and chat with your model as it learns. see what you can build with castform👇

English

228

236

2.6K

442.3K

girish@googrish·4d

@anwarlemu_ @Thariq_q haha the democratization of post-training is important nevertheless :)

English

0

1

38

Anwar Lemu@anwarlemu_·4d

@googrish @Thariq_q Haha. Fable crash came at the perfect time for you guys

English

1

0

1

46

girish@googrish·4d

with the events around fable, it’s clear that companies & developers need to own their models. the ability to post-train & rl models must become a more broadly accessible skill. rl fine tuning sounds like rocket science, but it really isn’t. so @Thariq_q (not the claude code guy :D) made a video that explains it with as little jargon as possible👇

English

5

28

1.9K

girish

Khám phá