AAAAAAAAAA

112 posts

AAAAAAAAAA banner
AAAAAAAAAA

AAAAAAAAAA

@aaaaaaaaaaorg

a lab focused on making research more accessible.

www Katılım Ağustos 2023
6 Takip Edilen2.1K Takipçiler
AAAAAAAAAA retweetledi
naklecha
naklecha@naklecha·
i’m building something new for teams that want to automate 100s of tiny research experiments on gpus (think @karpathy’s auto research). if you want to try out an early version, reach out to me!
English
2
1
23
1.8K
AAAAAAAAAA retweetledi
naklecha
naklecha@naklecha·
for people who run experiments on gpus: i built a cli tool that gives coding agents access to cloudy's infra (~2x cheaper than other sandbox / serverless clouds). with this cli, you can ask claude to: "finetune kimi k2 on 64 h100s & to save money test your finetune on 8 h100s"
English
5
8
68
7.9K
AAAAAAAAAA retweetledi
naklecha
naklecha@naklecha·
introducing simple-llm: a ~950 line, powerful & extensible inference engine that performs on par with vllm. enjoy :) performance (gpt-oss-120b, on an h100): - batch=1: 135 tok/s (vllm: 138) - batch=64: 4,041 tok/s (vllm: 3,846) github.com/naklecha/simpl…
English
24
78
772
68.1K
AAAAAAAAAA retweetledi
Jacob Rintamaki
Jacob Rintamaki@jacobrintamaki·
THE FINAL OFFSHORING If you try to look to the past to understand the future of automation, the future is bleak. On one hand, if we don’t go forward with robotics, then we’ll likely enter a stagnant, malthusian hellscape of a world. But on the other hand, if we do go forward with robotics, then society might not survive the transition intact. If these are the two default futures, then it's no wonder so many people are pessimistic. Is there an option to just have neither of these, please? Yes, actually. But it won’t be easy. It won’t be simple. And it will require reading a lot of footnotes.
Jacob Rintamaki tweet media
English
50
45
527
176.4K
AAAAAAAAAA retweetledi
naklecha
naklecha@naklecha·
saturday night random idea: i'm vibe reimplementing vllm into a single & simple inference file + a folder with custom kernels. rules: - allowed libs: torch, numpy, flash-attn - must match vllm's gpt-oss-120b inference speed - can optimise for the model and hardware (h100)
English
9
6
195
11.9K
AAAAAAAAAA retweetledi
naklecha
naklecha@naklecha·
Introducing "public volumes" on Cloudy - with this update, open-source projects can share as reproducible sandboxes that can be forked and mounted on GPU instances within seconds. Share, fork & mount 100TB+ sandboxes seamlessly with Cloudy. Here is a demo:
English
2
4
29
6.1K
AAAAAAAAAA retweetledi
AAAAAAAAAA retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI. It weighs ~8,000 lines of imo quite clean code to: - Train the tokenizer using a new Rust implementation - Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics - Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use. - SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval) - RL the model optionally on GSM8K with "GRPO" - Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI. - Write a single markdown report card, summarizing and gamifying the whole thing. Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc. My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved. Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.
Andrej Karpathy tweet media
English
690
3.4K
24.2K
5.8M
AAAAAAAAAA retweetledi
naklecha
naklecha@naklecha·
Announcing Cloudy, a platform that seamlessly handles your training infrastructure. You can rent a single H100 or a cluster of 1000 H100s, manage petabyte-scale storage volumes & seamlessly go from running experiments to managing large scale training runs on a single interface.
English
26
17
261
33.1K
AAAAAAAAAA retweetledi
naklecha
naklecha@naklecha·
today, i'm shutting down @aaaaaaaaaaorg. although i deeply believe in a10's mission, the only right way to build a10 would be to stay open, focus on education & stay non-profit. it will make a comeback someday, self-funded. until then, i'm focused on making fuck you money.
English
9
2
94
6.7K
AAAAAAAAAA retweetledi
naklecha
naklecha@naklecha·
i'm working on an open-source repo that teaches people llm inference optimisations. rn the fastest files on the repo are at 6.5-7k tps, as compared to vllm's 10k for the same batch size. i added a c++ implementation today. feel free to try it out (wip): github.com/naklecha/llm-i…
naklecha tweet media
English
6
11
225
15.5K
AAAAAAAAAA retweetledi
naklecha
naklecha@naklecha·
i'm working on a repository that implements increasingly complex llm inference optimizations in zero abstraction single file implementations. to start with, i put up 4 files on a repository that implements llama3.2 1b at 15tps, 60tps, 180tps, 1385tps on my 4090. the project will end when i am close to vllm's performance (7218 tps on my testset). then, i will do the same thing again with multiple gpus and implement parallel inference (maybe on a larger model). once i'm done with that, i'll write a blog / make a video explaining it all. i have a few more working optimisations and a few buggy ones that i'll fix and push to the repository this week. i wanted to get the ball rolling & set up the repository for you guys to check out. feedback/thoughts are welcome :)
English
6
3
128
9.8K
AAAAAAAAAA retweetledi
Pramod Goyal
Pramod Goyal@goyal__pramod·
Llama 3 implementation each matrix multiplication explained 1 by 1. One of the greatest resources to learn about LLMs from a very low level.
Pramod Goyal tweet media
English
4
37
282
15K
AAAAAAAAAA retweetledi
naklecha
naklecha@naklecha·
i love writing!! it gives me an excuse to learn topics and is my small way of giving back to the community (inspired by @karpathy's videos). my past blogs: 1. a blog on lcms (10k readers) 2. llama3 from scratch (14k github stars, ~100k readers) 3. an rl guide (20k readers)
English
0
2
17
1.5K
AAAAAAAAAA retweetledi
naklecha
naklecha@naklecha·
for my next blog post -- i am working on a comprehensive guide on data parallelism. the guide will explain relevant parallelization tricks, training on thousands of h100s, networking within gpus and nodes, some relevant cuda concepts & a bunch of pytorch code :) eta ~2 months.
naklecha tweet media
naklecha@naklecha

today, i'm excited to release a reinforcement learning guide that carefully explains the intuition and implementation details behind every single fundamental algorithm in the field. enjoy :) naklecha.com/reinforcement-…

English
4
7
104
6.4K
AAAAAAAAAA retweetledi
Antaripa Saha
Antaripa Saha@doesdatmaksense·
with deepseek's r1 release, now is the best time to learn about reinforcement learning @naklecha already dropped a bomb guide covering almost everything for someone to get started with RL. crazy thing is it covers both theory plus code examples
Antaripa Saha tweet media
English
8
102
1.1K
278.7K
AAAAAAAAAA retweetledi
naklecha
naklecha@naklecha·
notes on my reinforcement learning for rocket league side quest so far: i’m in awe that — we live in a time where i can train a reinforcement learning model that is learning to play rocket league and at the same time, test community made rocket league models. where, both processes can run in parallel & locally on my hardware. in the game window — i’m playing against a reinforcement learning ppo model that’s running locally. and in the vscode window — i’m training my own reinforcement learning model (just a dummy example script for now). also, the rocket league botting ecosystem has incredible tooling where you can test out different reinforcement learning ideas against other algorithms + it has great documentation for linux and windows. it’s honestly a beautiful hidden gem of a rabbit hole :)
naklecha@naklecha

my new ml side quest -- i'm going to build rocket league bots, using reinforcement learning. my goal is to train a bot (in 1v1s) that is better than ~95% of rocket league's player base (diamond/champ rank).

English
17
33
487
46.7K
AAAAAAAAAA retweetledi
naklecha
naklecha@naklecha·
my new ml side quest -- i'm going to build rocket league bots, using reinforcement learning. my goal is to train a bot (in 1v1s) that is better than ~95% of rocket league's player base (diamond/champ rank).
naklecha tweet media
English
7
4
140
47.1K