Kion

25 posts

Kion

@OKfallah

https://t.co/Fmva8l80Ln: online continual learning from interactions. Exploring @southpkcommons. Prev @Waabi_ai @mlatgt.

San Francisco, CA Katılım Mart 2015

334 Takip Edilen137 Takipçiler

Kion@OKfallah·6d

@DimitrisPapail Now its a question of how we can get the model to boost the sample efficiency of that self-improvement loop

English

102

Dimitris Papailiopoulos@DimitrisPapail·13 Mar

We're on the cusp of models that post-train themselves to get better. It's not speculative anymore. The self-improvement loop is actually closing..

English

701

43.3K

Kion@OKfallah·13 Mar

@AlexFinn If you want to make your local claw improve from interactions without wasting context on prompts and memory, check out this open source continual learning tool: github.com/kfallah/CLaaS

English

Alex Finn@AlexFinn·12 Mar

If you have your OpenClaw working 24/7 using frontier models like Opus, you're easily burning $300 a day. That's $100,000 a year. I have 3 Mac Studios and a DGX Spark running 4 high end local models (Nemotron 3, Qwen 3.5, Kimi K2.5, MiniMax2.5). They're chugging 24/7/365. I spent a third of that yearly cost to buy these computers I'll be able to use them for years for free On top of that they're completely private, secure, and personalized. Not a single prompt goes to a cloud server that can be read by an employee or used to train another model I hope this makes it painfully obvious why local is the future for AI agents. And why America needs to enter the local AI race.

English

433

167

2.4K

379.5K

Kion@OKfallah·26 Şub

Check it out, fully open-source: github.com/kfallah/CLaaS

English

241

Kion@OKfallah·26 Şub

In-context learning is a hack to remind your model. CLaaS uses self-distillation to move that knowledge into weights, freeing up context.

English

620

Kion@OKfallah·8 Oca

@cwolferesearch Cool paper. Reminds me of the recent work arguing lower KL losses can explain less forgetting during RFT. But it seems that you don’t even need a KL loss if your base policy has low enough reward variance?

English

137

Cameron R. Wolfe, Ph.D.@cwolferesearch·7 Oca

RL is actually pretty good at continual learning (depending on how you define it). These results make me feel like RL-based continually learning is possible quite achievable... Continual learning has recently become a popular topic of discussion. In order to maximize the utility of an LLM, this model needs to actively learn as it is used (i.e., on-the-job learning like humans). Currently, model utility is difficult to unlock because LLMs struggle to adapt or improve autonomously as they are used. Setup. As a proxy for continual learning, consider a continual post-training setup with T datasets {D_1, D_2, ..., D_T}. Given a base model (e.g., an existing Instruct model), we can sequentially train over each of these datasets. This mimics continual learning behavior, where the LLM is exposed and trained on new tasks over time. This is (obviously) not a perfect proxy, but it's an informative empirical setup. Metrics. To evaluate the model in terms of continual learning ability, we can consider two metrics: 1. Average accuracy of the model on test sets for all datasets after continual post-training completes. 2. Forgetting measure captures average difference between final accuracy of a task and the best accuracy achieved for that task throughout continual training. When we compare multi-task SFT training, sequential SFT and sequential RL (GRPO / RLOO) over a set of downstream training datasets, we see that (1) SFT leads to catastrophic forgetting of previously learned tasks during continual post-training on downstream tasks. Final average accuracy of 54% (much lower than multi-task learning with SFT, 62.9%). (2) SFT degrades general capabilities too; e.g., 52.1% → 40.1% on MMMU. (3) RL is naturally robust to forgetting. The forgetting measure is only -2.3%, and final average accuracy is 60%, which is close to upper bound of 62.9% achieved by multi-task SFT. (4) RL maintains (and even improves) general model capabilities. For example, MMMU improves by 2.1% and POPE improves by 1.9% when running sequential GRPO-based RL training. RL seems to be very good at maintaining and adding to model capabilities over time naturally! There are not even any replay buffers used for these results. As a disclaimer, there is still a ton of work left to figure out how continual learning would be defined in the real world, and it's possible this sim-to-real gap between proxy benchmarks like this and real life is huge. “Without any data replay, continual post-training with RL can achieve comparable performance with that of multi-task training, which is not achievable even when equipping SFT with continual learning strategies.”

English

236

13.3K

Kion@OKfallah·16 Eyl

@athleticKoder Great thread, I totally agree that small models + RL multi-turn rollouts is the way. I think SFT on example traces in the enviornment is an important piece too.

English

705

anshuman@athleticKoder·15 Eyl

You're in a Research engineer interview at OpenAI, and the interviewer asks: "How do you train your model for Computer Use? Can RL solve this? " Here's how you can answer:

English

740

95.3K

Kion@OKfallah·4 Oca

@0xluffy @scaling01 Look at recent citations on important papers in the subfield

English

luffy@0xluffy·4 Oca

how do you guys keep up with frontier research rn? is there an easier way (than arxiv) to find all these papers or is there a knowledge graph that aggregates it? rn only following people like @scaling01 closely

English

242

38.9K

Kion retweetledi

Christopher Rozell@crozSciTech·2 Kas

Please RT! Georgia Tech ECE is hiring faculty in bioengineering with a preference for candidates aligned with an emerging institute for neuroscience, neurotech and society: b.gatech.edu/3QIvgnZ. Details below. Apply here by Dec 15: bit.ly/3QIUaU8

English

22.8K

Kion retweetledi

Raquel Urtasun@RaquelUrtasun·21 Eyl

So excited to share @Waabi_ai strategic partnership with @UberFreight to accelerate the safe deployment of #AI-powered autonomous trucks at scale. Huge step toward the future of safer roads and more efficient supply chains. waabi.ai/waabi-uber-fre…

English

116

15.7K

Kion retweetledi

Christopher Rozell@crozSciTech·20 Eyl

Excited to share this paper using new neurotech and explainable AI to advance DBS for treatment resistant #depression (TRD). Team effort including @HelenMaybergMD, @sankar_alagapan and Patricio Riva-Posse. Summary thread! go.nature.com/48lmlzC @nature PC: Mike Halerz

English

121

314

112K

Kion@OKfallah·24 Tem

We also find that the Lie group operators can be useful for downstream tasks like semi-supervised learning. Again, check out the preprint for more results: arxiv.org/abs/2306.13544. Thank you to my collaborators @alec_helbling, @KyleAJohnsen, & @crozSciTech!

English

117

Kion@OKfallah·24 Tem

Our approach is compatible with any InfoNCE loss. When incorporated into SimCLR, we see consistent improvements in linear separability. We find that we can even match performance when training without a projection head!

English

139

Kion@OKfallah·24 Tem

Isn't it strange that most contrastive methods only use a fixed set of augmentations? Check out our work on ManifoldCLR, a system for using geometric models to generate feature augmentations and improve contrastive learning performance! arXiv: arxiv.org/abs/2306.13544. Details🧵👇

English

1.4K

Kion@OKfallah·16 Tem

More fun results & techniques in the paper! Check out the preprint here: arxiv.org/abs/2205.03665. If you're going to ICML, come say hi during the poster session Wednesday: icml.cc/virtual/2022/p…. Big thank you to my advisor @crozSciTech for the collab on this project! (4/4)

Kion@OKfallah

We train a Gaussian & sparse VAE on CelebA, then estimate a linear dict in place of the DNN decoder. Dict entries viz what is represented by each latent entry. Dict entries from Gaussian VAE resemble transformations from Beta-VAE; sparse VAEs resemble independent components (3/4)

English

Kion@OKfallah·16 Tem

English

Kion@OKfallah·16 Tem

When soft-thresholding, we use a straight-through estimator to prevent numerical instability & posterior collapse (when a latent entry is zero, we still want gradient passed to the inference ntwk). We also place a Gamma hyper-prior on the threshold var, denoted as lambda (2/4).

English

Kion@OKfallah·16 Tem

Do you like sparse coding? Do you like fast training? Check out work from @crozSciTech and me at #ICML2022 next week! We propose a variational approach to infer sparse codes by soft-thresholding samples drawn from a learned base distribution 🧵 (1/4).

English

Kion retweetledi

Alec Helbling@alec_helbling·28 Nis

Check out our paper at the ICLR DGM4HSD workshop this Friday. We developed a system that allows users to control the content of images synthesized with VAEs by asking users queries of the form, “do you prefer image A or image B?” Paper: openreview.net/forum?id=rNh4A…

English

Keşfet

@DimitrisPapail @AlexFinn @cwolferesearch @athleticKoder @0xluffy @scaling01 @Waabi_ai @UberFreight