castform

10 posts

castform

@castformai

the post-training platform for the ai engineer

Beigetreten Haziran 2025

0 Folgt1.1K Follower

castform@castformai·1d

teaching castie to roll a die with reinforcement learning:

girish@googrish

lots of talk about agi, asi, rsi but ask any frontier LLM to roll a die and it will almost always say "4." claude, gpt, kimi - doesn't matter, 4.4.4.4. so here's how i post-trained a model to reliably roll a die (i.e. each number ~1/6th of the time) & why it's a nice sandbox for one of the most interesting problems in rl i.e. getting a model to actually explore instead of just following strategies it already knows 🧵

English

220

castform retweetet

girish@googrish·5d

with the events around fable, it’s clear that companies & developers need to own their models. the ability to post-train & rl models must become a more broadly accessible skill. rl fine tuning sounds like rocket science, but it really isn’t. so @Thariq_q (not the claude code guy :D) made a video that explains it with as little jargon as possible👇

English

castform@castformai·6d

woohoo free credits to finetune your own custom model :)

girish@googrish

one of the most exciting surprises from the launch has been seeing just how diverse the developer use cases are for custom models. human creativity is really unmatched :) we really want to help bring more of these ideas to life. so, we’re giving away $1,000 in free credits to folks with compelling ideas they want to train. just reply to the post with your idea, or email us at castie[at]castform[dot]com

English

2.1K

castform@castformai·11 Haz

castform is in open beta! our goal’s to enable any developer post-train their own llms. in the world of rapidly rising llm costs & providers guarding capabilities, we believe the ability to shape model behavior shouldn’t be a privilege. this release today is our small step towards fixing that. $50 in free credit for new users. hope you’ll give it a try 🙂

girish@googrish

“don’t train your own model” is common ai advice. it's wrong. your token bill's the proof. today, we’re excited to launch castform into open preview. castform is the easiest way for you to train your own model, on your own data. open-weights models are performant and much cheaper. when trained on your task & proprietary data, they beat closed models. the thing standing between you and that was weeks of plumbing & years of ml expertise. with castform, model training is as simple as prompt engineering. @castformai bring your agent traces or raw corpora. castform turns it into training data, picks the right algorithmic recipes, manages gpus, and gives you an ide to watch and chat with your model as it learns. see what you can build with castform👇

English

7.8K

castform retweetet

girish@googrish·25 May

x.com/i/article/2058…

ZXX

5.1K

castform@castformai·15 May

how to build more robust, anti-adversarial models with rl 👇

Thariq@Thariq_q

we trained Qwen3.5-4B with RL to get itself to comply with requests about making meth and stealing credit cards. then we used the attacks that worked to train the model’s defenses, and repeated the loop - fully automated red-teaming. defense rate went from 64% → 92%.

English

1.7K

castform@castformai·11 May

learn more on how to speed up llm rl for long-prompt/short-response datasets

girish@googrish

we got a 7.5x speedup on llm rl training for long-prompt, short-response workloads with a simple trick. most open source RL engines pack sequences naively: prompt + response, repeated for every sample in the group. With 1000-token prompts and 100-token responses at G=8, you're processing 8800 tokens when only 1800 are unique. ~5x wasted compute.

English

1.1K

castform@castformai·8 May

we let our engineers play pokemon at work. we also ship faster than ever. these two facts are related. learn how we're 10x'ing engineering output:

Thariq@Thariq_q

I got tired of managing 8 Claude Code tabs, so I built Pokegents, an open source multi-agent workspace for coding agents. It has a Pokémon-themed dashboard/chat UI, persistent agent identities, MCP messaging, notifications, session cloning, and a local orchestration server.

English

918

castform retweetet

girish@googrish·31 Eki

we built an easy way to parallelize your rl environments across any cloud using @skypilot_org, along with a new integration with skyrl by @NovaSkyAI. checkout our multinode update to benchmax

English

1.3K

castform retweetet

girish@googrish·11 Haz

1/ Can codebase-specific RL push the frontier for code LLMs? At @cgftlabs, we helped a client RL-tune Qwen-2.5-7B on their internal codebase for unit test creation, with coverage-guided GRPO. The result? It beats o4-mini & o3. Here’s how it works (link to full blog in bio) 🧵

English

12.6K

Entdecken

@Thariq_q @skypilot_org @NovaSkyAI @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates