Kion

30 posts

Kion

@OKfallah

https://t.co/Fmva8l80Ln: online continual learning from interactions. Exploring @southpkcommons. Prev @Waabi_ai @mlatgt.

San Francisco, CA Katılım Mart 2015

347 Takip Edilen138 Takipçiler

Kion retweetledi

Alex Cui@alexcdot·2d

It sounds like someone at @EYnews needs to get fired 😰 We found a 44 page report by EY where 60% of the citations were AI hallucinations. Those fabricated claims were plagiarized from ANOTHER report by some web3 company 🤡 And now, ChatGPT cites that EY report like its gospel. We are all so so screwed. How does a firm that makes $53 billion/year 💸 get away with this?? 👇 Here’s how

English

11.1K

Kion@OKfallah·4 Nis

@cudagdb lfg

759

Kion retweetledi

Alex Zhurkevich@cudagdb·4 Nis

Trtllmgen kernels are now open. Fastest prefill and decode kernels for our target workloads. We wrote these to win InferenceX, MLPerf, other benchmarks. Powering some of today’s top served models. Dive in, learn, use them, or level up your own. Enjoy. github.com/flashinfer-ai/…

English

334

147.3K

Kion@OKfallah·25 Mar

Check out CLaaS, built on Tinker, for an OpenClaw that improves as you use it!

Tinker@tinkerapi

Continual Learning as a Service deploys a model that collects user feedback and distills the feedback into model weights via Tinker. CLaaS includes a dashboard for monitoring batches and an eval harness to make the training more deliberate. x.com/OKfallah/statu…

English

113

Kion retweetledi

Silen Naihin@silennai·25 Mar

Me and @okfallah built an open-source repo to apply continual learning to autoresearch with self distillation policy optimization (SDPO) We managed to beat Karpathy's baseline by recursively self improving Qwen3 14b on 8xH100s Results and learnings 👇

Andrej Karpathy@karpathy

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)

English

Kion@OKfallah·13 Mar

@DimitrisPapail Now its a question of how we can get the model to boost the sample efficiency of that self-improvement loop

English

103

Dimitris Papailiopoulos@DimitrisPapail·13 Mar

We're on the cusp of models that post-train themselves to get better. It's not speculative anymore. The self-improvement loop is actually closing..

English

699

43.8K

Kion@OKfallah·13 Mar

@AlexFinn If you want to make your local claw improve from interactions without wasting context on prompts and memory, check out this open source continual learning tool: github.com/kfallah/CLaaS

English

Alex Finn@AlexFinn·12 Mar

If you have your OpenClaw working 24/7 using frontier models like Opus, you're easily burning $300 a day. That's $100,000 a year. I have 3 Mac Studios and a DGX Spark running 4 high end local models (Nemotron 3, Qwen 3.5, Kimi K2.5, MiniMax2.5). They're chugging 24/7/365. I spent a third of that yearly cost to buy these computers I'll be able to use them for years for free On top of that they're completely private, secure, and personalized. Not a single prompt goes to a cloud server that can be read by an employee or used to train another model I hope this makes it painfully obvious why local is the future for AI agents. And why America needs to enter the local AI race.

English

424

165

2.4K

385.5K

Kion@OKfallah·26 Şub

Check it out, fully open-source: github.com/kfallah/CLaaS

English

333

Kion@OKfallah·26 Şub

In-context learning is a hack to remind your model. CLaaS uses self-distillation to move that knowledge into weights, freeing up context.

English

3.4K

Kion@OKfallah·8 Oca

@cwolferesearch Cool paper. Reminds me of the recent work arguing lower KL losses can explain less forgetting during RFT. But it seems that you don’t even need a KL loss if your base policy has low enough reward variance?

English

137

Cameron R. Wolfe, Ph.D.@cwolferesearch·7 Oca

RL is actually pretty good at continual learning (depending on how you define it). These results make me feel like RL-based continually learning is possible quite achievable... Continual learning has recently become a popular topic of discussion. In order to maximize the utility of an LLM, this model needs to actively learn as it is used (i.e., on-the-job learning like humans). Currently, model utility is difficult to unlock because LLMs struggle to adapt or improve autonomously as they are used. Setup. As a proxy for continual learning, consider a continual post-training setup with T datasets {D_1, D_2, ..., D_T}. Given a base model (e.g., an existing Instruct model), we can sequentially train over each of these datasets. This mimics continual learning behavior, where the LLM is exposed and trained on new tasks over time. This is (obviously) not a perfect proxy, but it's an informative empirical setup. Metrics. To evaluate the model in terms of continual learning ability, we can consider two metrics: 1. Average accuracy of the model on test sets for all datasets after continual post-training completes. 2. Forgetting measure captures average difference between final accuracy of a task and the best accuracy achieved for that task throughout continual training. When we compare multi-task SFT training, sequential SFT and sequential RL (GRPO / RLOO) over a set of downstream training datasets, we see that (1) SFT leads to catastrophic forgetting of previously learned tasks during continual post-training on downstream tasks. Final average accuracy of 54% (much lower than multi-task learning with SFT, 62.9%). (2) SFT degrades general capabilities too; e.g., 52.1% → 40.1% on MMMU. (3) RL is naturally robust to forgetting. The forgetting measure is only -2.3%, and final average accuracy is 60%, which is close to upper bound of 62.9% achieved by multi-task SFT. (4) RL maintains (and even improves) general model capabilities. For example, MMMU improves by 2.1% and POPE improves by 1.9% when running sequential GRPO-based RL training. RL seems to be very good at maintaining and adding to model capabilities over time naturally! There are not even any replay buffers used for these results. As a disclaimer, there is still a ton of work left to figure out how continual learning would be defined in the real world, and it's possible this sim-to-real gap between proxy benchmarks like this and real life is huge. “Without any data replay, continual post-training with RL can achieve comparable performance with that of multi-task training, which is not achievable even when equipping SFT with continual learning strategies.”

English

238

13.3K

Kion@OKfallah·16 Eyl

@athleticKoder Great thread, I totally agree that small models + RL multi-turn rollouts is the way. I think SFT on example traces in the enviornment is an important piece too.

English

707

anshuman@athleticKoder·15 Eyl

You're in a Research engineer interview at OpenAI, and the interviewer asks: "How do you train your model for Computer Use? Can RL solve this? " Here's how you can answer:

English

734

95.4K

Kion@OKfallah·4 Oca

@0xluffy @scaling01 Look at recent citations on important papers in the subfield

English

luffy@0xluffy·4 Oca

how do you guys keep up with frontier research rn? is there an easier way (than arxiv) to find all these papers or is there a knowledge graph that aggregates it? rn only following people like @scaling01 closely

English

239

38.9K

Kion retweetledi

Christopher Rozell@crozSciTech·2 Kas

Please RT! Georgia Tech ECE is hiring faculty in bioengineering with a preference for candidates aligned with an emerging institute for neuroscience, neurotech and society: b.gatech.edu/3QIvgnZ. Details below. Apply here by Dec 15: bit.ly/3QIUaU8

English

22.8K

Kion retweetledi

Raquel Urtasun@RaquelUrtasun·21 Eyl

So excited to share @Waabi_ai strategic partnership with @UberFreight to accelerate the safe deployment of #AI-powered autonomous trucks at scale. Huge step toward the future of safer roads and more efficient supply chains. waabi.ai/waabi-uber-fre…

English

116

15.7K

Kion retweetledi

Christopher Rozell@crozSciTech·20 Eyl

Excited to share this paper using new neurotech and explainable AI to advance DBS for treatment resistant #depression (TRD). Team effort including @HelenMaybergMD, @sankar_alagapan and Patricio Riva-Posse. Summary thread! go.nature.com/48lmlzC @nature PC: Mike Halerz

English

121

313

112K

Kion@OKfallah·24 Tem

We also find that the Lie group operators can be useful for downstream tasks like semi-supervised learning. Again, check out the preprint for more results: arxiv.org/abs/2306.13544. Thank you to my collaborators @alec_helbling, @KyleAJohnsen, & @crozSciTech!

English

121

Kion@OKfallah·24 Tem

Our approach is compatible with any InfoNCE loss. When incorporated into SimCLR, we see consistent improvements in linear separability. We find that we can even match performance when training without a projection head!

English

144

Kion@OKfallah·24 Tem

Isn't it strange that most contrastive methods only use a fixed set of augmentations? Check out our work on ManifoldCLR, a system for using geometric models to generate feature augmentations and improve contrastive learning performance! arXiv: arxiv.org/abs/2306.13544. Details🧵👇

English

1.5K

Keşfet

@EYnews @cudagdb @DimitrisPapail @AlexFinn @cwolferesearch @athleticKoder @0xluffy @scaling01