Kion

25 posts

Kion banner
Kion

Kion

@OKfallah

https://t.co/Fmva8l80Ln: online continual learning from interactions. Exploring @southpkcommons. Prev @Waabi_ai @mlatgt.

San Francisco, CA Katılım Mart 2015
334 Takip Edilen137 Takipçiler
Kion
Kion@OKfallah·
@DimitrisPapail Now its a question of how we can get the model to boost the sample efficiency of that self-improvement loop
English
0
0
0
102
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
We're on the cusp of models that post-train themselves to get better. It's not speculative anymore. The self-improvement loop is actually closing..
English
34
40
701
43.3K
Kion
Kion@OKfallah·
@AlexFinn If you want to make your local claw improve from interactions without wasting context on prompts and memory, check out this open source continual learning tool: github.com/kfallah/CLaaS
English
0
0
0
62
Alex Finn
Alex Finn@AlexFinn·
If you have your OpenClaw working 24/7 using frontier models like Opus, you're easily burning $300 a day. That's $100,000 a year. I have 3 Mac Studios and a DGX Spark running 4 high end local models (Nemotron 3, Qwen 3.5, Kimi K2.5, MiniMax2.5). They're chugging 24/7/365. I spent a third of that yearly cost to buy these computers I'll be able to use them for years for free On top of that they're completely private, secure, and personalized. Not a single prompt goes to a cloud server that can be read by an employee or used to train another model I hope this makes it painfully obvious why local is the future for AI agents. And why America needs to enter the local AI race.
Alex Finn tweet media
English
433
167
2.4K
379.5K
Kion
Kion@OKfallah·
In-context learning is a hack to remind your model. CLaaS uses self-distillation to move that knowledge into weights, freeing up context.
Kion tweet media
English
3
6
17
620
Kion
Kion@OKfallah·
@cwolferesearch Cool paper. Reminds me of the recent work arguing lower KL losses can explain less forgetting during RFT. But it seems that you don’t even need a KL loss if your base policy has low enough reward variance?
English
1
0
2
137
Cameron R. Wolfe, Ph.D.
Cameron R. Wolfe, Ph.D.@cwolferesearch·
RL is actually pretty good at continual learning (depending on how you define it). These results make me feel like RL-based continually learning is possible quite achievable... Continual learning has recently become a popular topic of discussion. In order to maximize the utility of an LLM, this model needs to actively learn as it is used (i.e., on-the-job learning like humans). Currently, model utility is difficult to unlock because LLMs struggle to adapt or improve autonomously as they are used. Setup. As a proxy for continual learning, consider a continual post-training setup with T datasets {D_1, D_2, ..., D_T}. Given a base model (e.g., an existing Instruct model), we can sequentially train over each of these datasets. This mimics continual learning behavior, where the LLM is exposed and trained on new tasks over time. This is (obviously) not a perfect proxy, but it's an informative empirical setup. Metrics. To evaluate the model in terms of continual learning ability, we can consider two metrics: 1. Average accuracy of the model on test sets for all datasets after continual post-training completes. 2. Forgetting measure captures average difference between final accuracy of a task and the best accuracy achieved for that task throughout continual training. When we compare multi-task SFT training, sequential SFT and sequential RL (GRPO / RLOO) over a set of downstream training datasets, we see that (1) SFT leads to catastrophic forgetting of previously learned tasks during continual post-training on downstream tasks. Final average accuracy of 54% (much lower than multi-task learning with SFT, 62.9%). (2) SFT degrades general capabilities too; e.g., 52.1% → 40.1% on MMMU. (3) RL is naturally robust to forgetting. The forgetting measure is only -2.3%, and final average accuracy is 60%, which is close to upper bound of 62.9% achieved by multi-task SFT. (4) RL maintains (and even improves) general model capabilities. For example, MMMU improves by 2.1% and POPE improves by 1.9% when running sequential GRPO-based RL training. RL seems to be very good at maintaining and adding to model capabilities over time naturally! There are not even any replay buffers used for these results. As a disclaimer, there is still a ton of work left to figure out how continual learning would be defined in the real world, and it's possible this sim-to-real gap between proxy benchmarks like this and real life is huge. “Without any data replay, continual post-training with RL can achieve comparable performance with that of multi-task training, which is not achievable even when equipping SFT with continual learning strategies.”
Cameron R. Wolfe, Ph.D. tweet media
English
11
36
236
13.3K
Kion
Kion@OKfallah·
@athleticKoder Great thread, I totally agree that small models + RL multi-turn rollouts is the way. I think SFT on example traces in the enviornment is an important piece too.
English
1
0
2
705
anshuman
anshuman@athleticKoder·
You're in a Research engineer interview at OpenAI, and the interviewer asks: "How do you train your model for Computer Use? Can RL solve this? " Here's how you can answer:
English
8
30
740
95.3K
Kion
Kion@OKfallah·
@0xluffy @scaling01 Look at recent citations on important papers in the subfield
English
0
0
0
50
luffy
luffy@0xluffy·
how do you guys keep up with frontier research rn? is there an easier way (than arxiv) to find all these papers or is there a knowledge graph that aggregates it? rn only following people like @scaling01 closely
English
31
13
242
38.9K
Kion retweetledi
Christopher Rozell
Christopher Rozell@crozSciTech·
Please RT! Georgia Tech ECE is hiring faculty in bioengineering with a preference for candidates aligned with an emerging institute for neuroscience, neurotech and society: b.gatech.edu/3QIvgnZ. Details below. Apply here by Dec 15: bit.ly/3QIUaU8
English
2
50
74
22.8K
Kion retweetledi
Raquel Urtasun
Raquel Urtasun@RaquelUrtasun·
So excited to share @Waabi_ai strategic partnership with @UberFreight to accelerate the safe deployment of #AI-powered autonomous trucks at scale. Huge step toward the future of safer roads and more efficient supply chains. waabi.ai/waabi-uber-fre…
Raquel Urtasun tweet media
English
5
19
116
15.7K
Kion
Kion@OKfallah·
Our approach is compatible with any InfoNCE loss. When incorporated into SimCLR, we see consistent improvements in linear separability. We find that we can even match performance when training without a projection head!
Kion tweet media
English
1
0
1
139
Kion
Kion@OKfallah·
Isn't it strange that most contrastive methods only use a fixed set of augmentations? Check out our work on ManifoldCLR, a system for using geometric models to generate feature augmentations and improve contrastive learning performance! arXiv: arxiv.org/abs/2306.13544. Details🧵👇
Kion tweet media
English
1
0
11
1.4K
Kion
Kion@OKfallah·
More fun results & techniques in the paper! Check out the preprint here: arxiv.org/abs/2205.03665. If you're going to ICML, come say hi during the poster session Wednesday: icml.cc/virtual/2022/p…. Big thank you to my advisor @crozSciTech for the collab on this project! (4/4)
Kion@OKfallah

We train a Gaussian & sparse VAE on CelebA, then estimate a linear dict in place of the DNN decoder. Dict entries viz what is represented by each latent entry. Dict entries from Gaussian VAE resemble transformations from Beta-VAE; sparse VAEs resemble independent components (3/4)

English
0
0
0
0
Kion
Kion@OKfallah·
We train a Gaussian & sparse VAE on CelebA, then estimate a linear dict in place of the DNN decoder. Dict entries viz what is represented by each latent entry. Dict entries from Gaussian VAE resemble transformations from Beta-VAE; sparse VAEs resemble independent components (3/4)
Kion tweet media
English
0
0
0
0
Kion
Kion@OKfallah·
When soft-thresholding, we use a straight-through estimator to prevent numerical instability & posterior collapse (when a latent entry is zero, we still want gradient passed to the inference ntwk). We also place a Gamma hyper-prior on the threshold var, denoted as lambda (2/4).
Kion tweet media
English
1
0
0
0
Kion
Kion@OKfallah·
Do you like sparse coding? Do you like fast training? Check out work from @crozSciTech and me at #ICML2022 next week! We propose a variational approach to infer sparse codes by soft-thresholding samples drawn from a learned base distribution 🧵 (1/4).
Kion tweet media
English
1
0
4
0
Kion retweetledi
Alec Helbling
Alec Helbling@alec_helbling·
Check out our paper at the ICLR DGM4HSD workshop this Friday. We developed a system that allows users to control the content of images synthesized with VAEs by asking users queries of the form, “do you prefer image A or image B?” Paper: openreview.net/forum?id=rNh4A…
English
1
5
17
0