Bidipta Sarkar

243 posts

Bidipta Sarkar banner
Bidipta Sarkar

Bidipta Sarkar

@bidiptas13

PhD Student at @flair_ox and @whi_rl | Stanford BS CS '24 @StanfordAILab | Ig @bidiptas13

Katılım Eylül 2021
101 Takip Edilen958 Takipçiler
Sabitlenmiş Tweet
Bidipta Sarkar
Bidipta Sarkar@bidiptas13·
Introducing 🥚EGGROLL 🥚(Evolution Guided General Optimization via Low-rank Learning)! 🚀 Scaling backprop-free Evolution Strategies (ES) for billion-parameter models at large population sizes ⚡100x Training Throughput 🎯Fast Convergence 🔢Pure Int8 Pretraining of RNN LLMs
Bidipta Sarkar tweet media
English
20
145
949
262.6K
Bidipta Sarkar
Bidipta Sarkar@bidiptas13·
@alexkrstern @oliviscusAI lol, I wonder if the bots got confused by the fact that we released an updated version on arxiv, but that was still a month ago...
English
0
0
1
17
Oliver Prompts
Oliver Prompts@oliviscusAI·
🚨 BREAKING: NVIDIA proved backpropagation isn't the only way to build an AI. They trained billion-parameter models without a single gradient. Every AI you use today relies on backpropagation. It requires complex calculus, exploding memory, and massive GPU clusters. Meanwhile, an ancient, gradient-free method called Evolution Strategies (ES) was written off as impossible to scale. Until now. NVIDIA and Oxford just dropped EGGROLL. Instead of generating massive, full-rank matrices for every mutation, they split them into two tiny ones. The AI mutates. It tests. It keeps what works. Like biological evolution. But now, it does it with hundreds of thousands of parallel mutations at once. Throughput is now as fast as batched inference. They are pretraining models entirely from scratch using only simple integers. No backprop. No decimals. No gradients. We thought the future of AI required endless clusters of precision hardware. It turns out, we just needed to evolve.
Oliver Prompts tweet media
English
101
422
2.4K
153.7K
Bidipta Sarkar
Bidipta Sarkar@bidiptas13·
Please activate, Cunningham's law
English
0
0
0
189
Bidipta Sarkar
Bidipta Sarkar@bidiptas13·
It is incredibly stupid that no transformer inference framework can handle multi-LoRA + tensor parallelism for MoE models
English
2
1
6
520
Bidipta Sarkar retweetledi
Kenneth Stanley
Kenneth Stanley@kenneth0stanley·
It's interesting that we seek evidence of whether something is AGI in terms of whether a single model ("it") can do something like discover general relativity. But such discoveries are population-level feats, even if an Einstein is the final generator. The reason we could get one person in 1915 who could do that is that there were almost 2 billion diverse variations on the human mind at the time to choose from. To invest astronomical compute into a *single* model AI that could reliably do what Einstein did given knowledge up to 1911 is a vastly different problem than training 2 billion diverse models and finding the one (or it finding itself) with the right problem alignment. Humanity offers no precedent for cooking up a single model that can do anything the very best human can do in any field, though the subtlety of the distinction is easily overlooked.
Rohan Paul@rohanpaul_ai

Demis Hassabis’s “Einstein test” for defining AGI: Train a model on all human knowledge but cut it off at 1911, then see if it can independently discover general relativity (as Einstein did by 1915); if yes, it’s AGI.

English
51
30
236
33.6K
Bidipta Sarkar
Bidipta Sarkar@bidiptas13·
They say I have “un-f-able intelligence”
English
1
0
5
541
Bidipta Sarkar
Bidipta Sarkar@bidiptas13·
Just stumbled across a wild EGGROLL on LinkedIn!
Bidipta Sarkar tweet media
English
4
1
13
673
Bidipta Sarkar retweetledi
Bidipta Sarkar retweetledi
Bidipta Sarkar
Bidipta Sarkar@bidiptas13·
@max_takeoff @UnslothAI I'm personally quite inexperienced with the LLM tooling ecosystem since I just build everything from scratch in pure jax (+ cuda when needed) However, I'd be super supportive of any integrations, and our team has some WIP for vLLM!
English
0
0
1
68
Max Caldwell
Max Caldwell@max_takeoff·
@bidiptas13 You seem to be investing a lot in tooling for this! Have you considered an @UnslothAI integration or something?
English
1
0
1
43
Bidipta Sarkar
Bidipta Sarkar@bidiptas13·
As promised at the end of the interview, I've made a little Christmas present for the EGGROLL community 🎁 The eggroll repo now has a simple colab notebook to guide newcomers to the codebase
Yacine Mahdid@yacinelearning

this was such an intellectually refreshing interview about evolution strategies and how interesting research like eggroll can bloom with more resources check out the full 1h40 interview where I held our man @bidiptas13 hostage with my questions for far too long

English
3
2
26
2.8K
Bidipta Sarkar
Bidipta Sarkar@bidiptas13·
@blackplasma22 Yeah, though I think this is more of an issue with RLVR and specifically GRPO-style normalization. Classic RL can work with extremely small batch sizes, but it is non-trivial: arxiv.org/abs/2410.14606
English
1
0
2
54
shyam
shyam@blackplasma22·
@bidiptas13 the issue with small batch sizes in RL is you get way too many 0 rewards, killing gradient updates. bigger batch = better reward distribution = better grads however, i still had NO idea that big batch sizes wasn’t good for pretrain wow
English
1
0
2
74
shyam
shyam@blackplasma22·
no money for compute no decent batch size no good gradients no happiness
shyam tweet media
English
1
0
1
219