rain

129 posts

rain banner
rain

rain

@rainx0r

phd student / trying to make RL work / yap into the abyss about AI research, coding, etc.

guildford, england Katılım Mayıs 2020
223 Takip Edilen36 Takipçiler
rain
rain@rainx0r·
I used to think this until I got nerdsniped trying to smoothen out my ppo curves recently and turns out it really does matter, particularly for critic learning if you don’t handle terminated vs truncated mechanics in your GAE code, and your env is almost always truncating (e.g. most continuous control tasks)
English
0
0
0
26
Mefaso
Mefaso@Mefaso·
@willccbb terminated vs done doesn't matter, it should be a 4 tuple at most
English
1
0
2
574
rain
rain@rainx0r·
@deredleritt3r how come they’re still so terrible at it then lol, the only acceleration they bring to AI research atm is solving obscure cuda errors fast
English
0
0
0
43
prinz
prinz@deredleritt3r·
The primary purpose of Claude Code and Codex is to accelerate AI research. Yes, these tools also help you vibe-code B2B SaaS. Yes, they also drive enterprise revenue. But that is *not* their primary purpose.
English
27
15
372
24.2K
rain retweetledi
Luke Metro
Luke Metro@luke_metro·
lol
Luke Metro tweet media
23
20
991
183.4K
rain
rain@rainx0r·
Hate to Schmidhuber you, but no, Meta-RL is a really old idea, please stop spreading misinformation. These researchers don’t “introduce” it, they’re just applying it to LLMs. Meta-RL, as people talk about it today, was originally introduced by [1,2], although I'm sure the actual Jürgen could show up and bring up some citation from the 90s doing something similar. There's also a nicely written survey about it [3]. Also, I don't think this has anything to do with Continual Learning, especially not as Sutton understands it and tries to promote. Meta-RL is primarily just a way to incentivise test-time adaptation / task identification behaviour in the agent. I guess it's a cool buzzword for the algorithm rn tho [1] arxiv.org/abs/1611.02779 [2] Botvinick, M., Ritter, S., Wang, J.X., Kurth-Nelson, Z., Blundell, C. and Hassabis, D., 2019. Reinforcement learning, fast and slow. Trends in cognitive sciences, 23(5), pp.408-422. cell.com/trends/cogniti… [3] arxiv.org/abs/2301.08028
English
0
0
22
1.6K
Ronak Malde
Ronak Malde@rronak_·
This might be my favorite paper of the year🤯 Rich Sutton claims that current RL methods won't get us to continual learning because they don't compound upon previous knowledge, every rollout starts from scratch. Researchers in Switzerland introduce Meta-RL which might crack that code. Optimize across episodes with a meta-learning objective, which then incentivizes agents to explore first and then exploit. And then reflect upon previous failures for future agent runs. Incredible results and incredible read of a paper overall. Authors: @YulunJiang @LiangzeJ @DamienTeney @Michael_D_Moor @mariabrbic
Ronak Malde tweet media
English
31
103
885
120.8K
rain
rain@rainx0r·
having gotten sick at neurips I think we should shift all AI4Science efforts to eradicating the common cold I don’t even care about curing cancer anymore
English
0
0
0
74
rain
rain@rainx0r·
@rosemary_ke e-mail'd since your DMs are closed, would be really keen to meet
English
0
0
1
142
Nan Rosemary Ke
Nan Rosemary Ke@rosemary_ke·
At NeurIPS this week. Excited to meet, please reach out. - Focussed on Scaling LLM-RL - Working on real world evals and long form generation (mathematical proofs/STEM) - Scaling tasks for agents (computer use/ coding/ research)
English
3
2
49
6K
rain
rain@rainx0r·
torn between rust, zig or doing some gimmick for advent of code, I love decision paralysis
English
0
0
0
43
rain
rain@rainx0r·
do I really have to figure out some kind of local PR review workflow to respond to people opening up AI slop PRs on my repos 😭
English
0
0
0
31
rain
rain@rainx0r·
why is @github so insanely laggy for PR reviews now, the diff isn’t even that large
English
1
0
0
34
rain
rain@rainx0r·
@melqtx have you ssh’d into anything before
English
1
0
98
11.4K
mel
mel@melqtx·
can someone convince me, just one reason, to use tmux?
mel tweet media
English
1K
42
2.3K
437.8K
Daniel
Daniel@nearlydaniel·
Interesting asymmetry in AI interactions: You speak faster than you can type But you read faster than you can listen So pure text is slow, and pure voice is slow “Speak and read” seems like the fastest 2 way bitrate we have until brain implants
English
349
230
4K
232K
rain
rain@rainx0r·
@skooookum last time I used it it was laggy af especially on iPad with pencil drawing, I wonder if it’s better yet
English
0
0
1
4K
skooks
skooks@skooookum·
Quietly the best piece of software Apple has created in the last decade
skooks tweet media
English
151
58
2.7K
434.2K
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
PS: i like TPUs and MaxText is a fine codebase, but this was too funny to read, who the hell wrote this?!
English
2
1
70
9.5K
rain
rain@rainx0r·
3.14 doesn’t remove the GIL. the free-threaded python build which has existed since 3.13 does, it’s just the newer version (3.14t) isn’t considered experimental anymore. just getting things installed on 3.14 isn’t enough: you need to get things installed on 3.14t *and* those libraries have to support free-threaded python to work a lot of libraries that are written in other languages like C and just have python bindings still don’t support free threaded python and that’s the real bottleneck rather than 3.14 not marking it as experimental anymore, although the major ones like NumPy already do thankfully
English
1
0
17
1.3K
Jeffrey Emanuel
Jeffrey Emanuel@doodlestein·
So Python 3.14 finally came out for real yesterday. Finally removing the GIL (global interpreter lock), which allows for way faster multithreaded code without dealing with all the brain damage and overhead of multiprocessing or other hacky workarounds. And uv already fully supports it, which is wildly impressive. But anyway, I was a bit bummed out, because the main project I’m working on has a massive number of library dependencies, and it always takes a very long time to get mainline support for new python versions, particularly when they’re as revolutionary and different as version 3.14 is. So I was resigned to endure GIL-hell for the indefinite future. But then I figured, why not? Let me just see if codex and GPT-5 can power through it all. So I backed up my settings and asked codex to try, giving it the recent blog post from the uv team to get it started. There were some major roadblocks. I use PyTorch, which is notoriously slow to update. And also pyarrow, which also didn’t support 3.14. Same with cvxpy, the wrapper to the convex optimization library. Still, I wanted to see what we could do even if we had to deal with the brain damage of “vendoring” some libraries and building some stuff from scratch in C++, Rust, etc. using the latest nightly GitHub repositories instead of the usual PyPi libraries. I told codex to search the web, to read GitHub issue pages, etc, so that we didn’t reinvent the wheel (or WHL I should say, 🤣) unnecessarily. Why not? I could always test things, and if I couldn’t get it to work, then I could just retreat back to Python 3.13, right? No harm, no foul. Well, it took many hours of work, almost all of it done by codex while I occasionally checked in with it, but it managed to get everything working! Sure, it took a bunch of iterations, and I had to go tweak some stuff to avoid annoying deprecation warnings (some of which come from other libraries, so I ultimately had to filter them). But those libraries will update over time to better support 3.14 and eventually I won’t need to use any of these annoying workarounds. Codex even suggested uploading the compiled whl artifacts to Cloudflare’s R2 (like s3) so we could reuse them easily across machines, and took care of all the details for me. I would never think to do that on my own. Every time there was another complication or problem (for instance, what is shown in the screenshot below), codex just figured it out and plowed through it all like nothing. If you’ve never tried to do something like this in the “bad old days” prior to LLMs, it was a thankless grind that could eat up days and then hit a roadblock, resulting in a total wipeout. So it was simply too risky to even try it most of the time; you were better off just waiting 6 or 9 months for things to become simple again. Anyway, I still can’t really believe it’s all working! We are living in the future.
Jeffrey Emanuel tweet media
English
58
159
2.3K
236.3K
rain
rain@rainx0r·
it’s slower and worse than being in game but the point isn’t to solve Minecraft it’s to demonstrate that learning entirely in a learned world model is possible, there are many domains where the real thing isn’t a video game and model learning is the only path to large scale RL training, that’s like the whole premise of MBRL
English
1
0
4
2K
mike64_t
mike64_t@mike64_t·
just so you guys know the bottleneck for getting data out of Minecraft is literally FFmpeg and I can guarantee this model is both slower and worse than actually being ingame. The timer speed can be increased, frame capture can happen in game at scaled rate, and the thing that will make your system come to a crawl is FFmpeg and there's nothing you can do about it. You can't encode video at 1200 fps with current technology unless you're recording at like 240p or you have a 20PB SSD somewhere and decide to use avi.
alphaXiv@askalphaxiv

Crazy paper from Google DeepMind Dreamer 4 can mine Minecraft diamonds without ever touching the real game It trained entirely inside its own world model, kinda like an “imaginary world”, showing agents can tackle long horizon tasks safely by just learning from videos!

English
17
9
296
108.9K
Peter Wildeford🇺🇸🚀
Peter Wildeford🇺🇸🚀@peterwildeford·
Ignore the scary cyber results and feast your eyes on the best graph footnote of all time
Peter Wildeford🇺🇸🚀 tweet media
English
13
44
964
46.2K
rain
rain@rainx0r·
since when does it matter what humans do? the human brain doesn't implement stochastic gradient descent humans don't do pretraining on every point of a given manifold humans don't have to do explicit long context extension midtraining is every other layer of current deep learning systems a bad idea too because they're not what humans do? imo trying to use what humans do as a good north star in AI is doomed in general. we're trying to train computers to solve problems to, ideally, superhuman level and we should think about what computers are good at (memory, parallelism, compute speed) and exploit it every step of the way. RL plays to those strengths and allows you to optimise directly for success in a given task, which is extremely powerful and does work. the real issue with RL (which is probably what you mean) is that you don't really get OOD capabilities out of it. similar to supervised learning you have to define tasks at training time (through a reward function rather than a labeled dataset), and you can't expect good performance for things it wasn't trained on, because even if you train a multi-task model on a ton of different environments you won't get human-like reasoning and online problem solving and adaptation. but that's another problem entirely. "most economically viable tasks" can just be defined as envs and solved directly. you can get all those other things though if you're so inclined, with RL of all things even, if you make the environment general/challenging enough. since you can optimise directly for success in any task, you can just make the task "be good at learning / adapting and few-shot solving ood tasks" (which is often referred to as meta-reinforcement learning and there's some good results for it already)
English
0
0
0
206
Andrej Karpathy
Andrej Karpathy@karpathy·
In era of pretraining, what mattered was internet text. You'd primarily want a large, diverse, high quality collection of internet documents to learn from. In era of supervised finetuning, it was conversations. Contract workers are hired to create answers for questions, a bit like what you'd see on Stack Overflow / Quora, or etc., but geared towards LLM use cases. Neither of the two above are going away (imo), but in this era of reinforcement learning, it is now environments. Unlike the above, they give the LLM an opportunity to actually interact - take actions, see outcomes, etc. This means you can hope to do a lot better than statistical expert imitation. And they can be used both for model training and evaluation. But just like before, the core problem now is needing a large, diverse, high quality set of environments, as exercises for the LLM to practice against. In some ways, I'm reminded of OpenAI's very first project (gym), which was exactly a framework hoping to build a large collection of environments in the same schema, but this was way before LLMs. So the environments were simple academic control tasks of the time, like cartpole, ATARI, etc. The @PrimeIntellect environments hub (and the `verifiers` repo on GitHub) builds the modernized version specifically targeting LLMs, and it's a great effort/idea. I pitched that someone build something like it earlier this year: x.com/karpathy/statu… Environments have the property that once the skeleton of the framework is in place, in principle the community / industry can parallelize across many different domains, which is exciting. Final thought - personally and long-term, I am bullish on environments and agentic interactions but I am bearish on reinforcement learning specifically. I think that reward functions are super sus, and I think humans don't use RL to learn (maybe they do for some motor tasks etc, but not intellectual problem solving tasks). Humans use different learning paradigms that are significantly more powerful and sample efficient and that haven't been properly invented and scaled yet, though early sketches and ideas exist (as just one example, the idea of "system prompt learning", moving the update to tokens/contexts not weights and optionally distilling to weights as a separate process a bit like sleep does).
Prime Intellect@PrimeIntellect

Introducing the Environments Hub RL environments are the key bottleneck to the next wave of AI progress, but big labs are locking them down We built a community platform for crowdsourcing open environments, so anyone can contribute to open-source AGI

English
257
858
7.3K
947K