Bidipta Sarkar (@bidiptas13) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Introducing 🥚EGGROLL 🥚(Evolution Guided General Optimization via Low-rank Learning)! 🚀 Scaling backprop-free Evolution Strategies (ES) for billion-parameter models at large population sizes ⚡100x Training Throughput 🎯Fast Convergence 🔢Pure Int8 Pretraining of RNN LLMs

English

20

145

949

262.6K

Bidipta Sarkar@bidiptas13·2d

I feel like the press team should've just waited 9 minutes instead of leaving us guessing

Polymarket@Polymarket

BREAKING: Google research reveals quantum computers may be able to crack Bitcoin's private keys in just 9 minutes.

English

1

0

4

190

Bidipta Sarkar@bidiptas13·5d

@alexkrstern @oliviscusAI lol, I wonder if the bots got confused by the fact that we released an updated version on arxiv, but that was still a month ago...

English

0

1

17

Alex Stern@alexkrstern·5d

@oliviscusAI @bidiptas13

QAM

1

0

1

207

Oliver Prompts@oliviscusAI·5d

🚨 BREAKING: NVIDIA proved backpropagation isn't the only way to build an AI. They trained billion-parameter models without a single gradient. Every AI you use today relies on backpropagation. It requires complex calculus, exploding memory, and massive GPU clusters. Meanwhile, an ancient, gradient-free method called Evolution Strategies (ES) was written off as impossible to scale. Until now. NVIDIA and Oxford just dropped EGGROLL. Instead of generating massive, full-rank matrices for every mutation, they split them into two tiny ones. The AI mutates. It tests. It keeps what works. Like biological evolution. But now, it does it with hundreds of thousands of parallel mutations at once. Throughput is now as fast as batched inference. They are pretraining models entirely from scratch using only simple integers. No backprop. No decimals. No gradients. We thought the future of AI required endless clusters of precision hardware. It turns out, we just needed to evolve.

English

101

422

2.4K

153.7K

Bidipta Sarkar@bidiptas13·15 Mar

"Great news, honey! Thanks to this new tractor, we no longer need to make children!"

Sayash Kapoor@sayashk

In the last few months, I've spoken to many CS professors who asked me if we even need CS PhD students anymore. Now that we have coding agents, can't professors work directly with agents? My view is that equipping PhD students with coding agents will allow them to do work that is orders of magnitude more impressive than they otherwise could. And they can be *accountable* for their outcomes in a way agents can't (yet). For example, who checks the agent's outputs are correct? Who is responsible for mistakes or errors?

English

1

0

18

2.1K

Bidipta Sarkar@bidiptas13·27 Şub

You know what I'm going to say...

Matt Clifford@matthewclifford

Great piece in @thetimes on @ARIA_research’s new Scaling Inference Lab, which launches today. There’s a huge opportunity for the UK in compute and this is an important piece of the puzzle

English

0

1

594

Bidipta Sarkar@bidiptas13·24 Şub

Please activate, Cunningham's law

English

0

189

Bidipta Sarkar@bidiptas13·24 Şub

It is incredibly stupid that no transformer inference framework can handle multi-LoRA + tensor parallelism for MoE models

English

2

1

6

520

Bidipta Sarkar retweetledi

Foerster Lab for AI Research@FLAIR_Ox·24 Şub

How can MARL be used to train driving policies in new cities without needing any human demonstration? Check out our new paper to find out!

Zilin Wang@zilinwang4ai

1/ 🚗 🌏 What if an autonomous vehicle could move to a new city without collecting a single human demonstration in that city? I am so excited to introduce our new work: Learning to Drive in New Cities Without Human Demonstrations.

English

0

2

22

1.9K

Bidipta Sarkar retweetledi

Kenneth Stanley@kenneth0stanley·22 Şub

It's interesting that we seek evidence of whether something is AGI in terms of whether a single model ("it") can do something like discover general relativity. But such discoveries are population-level feats, even if an Einstein is the final generator. The reason we could get one person in 1915 who could do that is that there were almost 2 billion diverse variations on the human mind at the time to choose from. To invest astronomical compute into a *single* model AI that could reliably do what Einstein did given knowledge up to 1911 is a vastly different problem than training 2 billion diverse models and finding the one (or it finding itself) with the right problem alignment. Humanity offers no precedent for cooking up a single model that can do anything the very best human can do in any field, though the subtlety of the distinction is easily overlooked.

Rohan Paul@rohanpaul_ai

Demis Hassabis’s “Einstein test” for defining AGI: Train a model on all human knowledge but cut it off at 1911, then see if it can independently discover general relativity (as Einstein did by 1915); if yes, it’s AGI.

English

51

30

236

33.6K

Bidipta Sarkar@bidiptas13·20 Şub

Check out our new work on autonomous driving with MARL!

Zilin Wang@zilinwang4ai

1/ 🚗 🌏 What if an autonomous vehicle could move to a new city without collecting a single human demonstration in that city? I am so excited to introduce our new work: Learning to Drive in New Cities Without Human Demonstrations.

English

0

7

498

Bidipta Sarkar retweetledi

Johann Zahlmann@1zahlmann·1 Şub

Chasing backprop performance with EGGROLL @bidiptas13 youtu.be/9v0OLrzHJnQ

YouTube

English

1

3

8

890

Bidipta Sarkar@bidiptas13·31 Oca

They say I have “un-f-able intelligence”

English

1

0

5

541

Bidipta Sarkar@bidiptas13·19 Oca

Check out Johann's YouTube video here: youtube.com/watch?v=FEYchx… The GitHub Repo: github.com/zahlmann/eggro…

YouTube

English

0

3

454

Bidipta Sarkar@bidiptas13·19 Oca

Just stumbled across a wild EGGROLL on LinkedIn!

English

4

1

13

673

Bidipta Sarkar retweetledi

Yacine Mahdid@yacinelearning·27 Ara

I don’t like the “do this to not fall behind” mentality about tech. it’s an unhelpful mirage. if it was true you need to do lots of experimentations in order to not-fall-behind™️ then it would mean any newcomer would be so hopelessly behind the field would be unatainable.

Andrej Karpathy@karpathy

I've never felt this much behind as a programmer. The profession is being dramatically refactored as the bits contributed by the programmer are increasingly sparse and between. I have a sense that I could be 10X more powerful if I just properly string together what has become available over the last ~year and a failure to claim the boost feels decidedly like skill issue. There's a new programmable layer of abstraction to master (in addition to the usual layers below) involving agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering. Clearly some powerful alien tool was handed around except it comes with no manual and everyone has to figure out how to hold it and operate it, while the resulting magnitude 9 earthquake is rocking the profession. Roll up your sleeves to not fall behind.

English

12

3

86

6.2K

Bidipta Sarkar retweetledi

Joseph Suarez 🐡@jsuarez·27 Ara

Either I'm blind or my goat is washed, because I've never felt so far ahead. Fine. I will try the latest this-time-its-good-we-promise model and see if produces usable code.

Andrej Karpathy@karpathy

I've never felt this much behind as a programmer. The profession is being dramatically refactored as the bits contributed by the programmer are increasingly sparse and between. I have a sense that I could be 10X more powerful if I just properly string together what has become available over the last ~year and a failure to claim the boost feels decidedly like skill issue. There's a new programmable layer of abstraction to master (in addition to the usual layers below) involving agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering. Clearly some powerful alien tool was handed around except it comes with no manual and everyone has to figure out how to hold it and operate it, while the resulting magnitude 9 earthquake is rocking the profession. Roll up your sleeves to not fall behind.

English

44

7

584

92.6K

Bidipta Sarkar@bidiptas13·25 Ara

@max_takeoff @UnslothAI I'm personally quite inexperienced with the LLM tooling ecosystem since I just build everything from scratch in pure jax (+ cuda when needed) However, I'd be super supportive of any integrations, and our team has some WIP for vLLM!

English

0

1

68

Max Caldwell@max_takeoff·25 Ara

@bidiptas13 You seem to be investing a lot in tooling for this! Have you considered an @UnslothAI integration or something?

English

1

0

1

43

Bidipta Sarkar@bidiptas13·25 Ara

As promised at the end of the interview, I've made a little Christmas present for the EGGROLL community 🎁 The eggroll repo now has a simple colab notebook to guide newcomers to the codebase

Yacine Mahdid@yacinelearning

this was such an intellectually refreshing interview about evolution strategies and how interesting research like eggroll can bloom with more resources check out the full 1h40 interview where I held our man @bidiptas13 hostage with my questions for far too long

English

3

2

26

2.8K

Bidipta Sarkar@bidiptas13·25 Ara

Very interesting...

The Kobeissi Letter@KobeissiLetter

BREAKING: Nvidia, $NVDA, announces it is acquiring Groq for $20 billion, its largest acquisition of all time.

English

0

1

2

565

Bidipta Sarkar@bidiptas13·24 Ara

@blackplasma22 Yeah, though I think this is more of an issue with RLVR and specifically GRPO-style normalization. Classic RL can work with extremely small batch sizes, but it is non-trivial: arxiv.org/abs/2410.14606

English

1

0

2

54

shyam@blackplasma22·24 Ara

@bidiptas13 the issue with small batch sizes in RL is you get way too many 0 rewards, killing gradient updates. bigger batch = better reward distribution = better grads however, i still had NO idea that big batch sizes wasn’t good for pretrain wow

English

1

0

2

74

shyam@blackplasma22·23 Ara

no money for compute no decent batch size no good gradients no happiness

English

1

0

1

219

Bidipta Sarkar

Keşfet