Mithil Vakde

608 posts

Mithil Vakde

@evilmathkid

Sample efficiency research prev: Engineering physics @iitbombay '23

Beigetreten Haziran 2023

307 Folgt3.3K Follower

Angehefteter Tweet

Mithil Vakde@evilmathkid·5 Mar

44% on ARC-AGI-1 in 67 cents! Trained from scratch in 2hrs on a 5090 Matches TRM, beats HRM and is way faster & cheaper No recursion, just a transformer Also, 7% on ARC-2 🧵

English

683

53.7K

Mithil Vakde@evilmathkid·1d

@willdepue Will be there :)

English

218

will depue@willdepue·1d

@evilmathkid hope to see you on the leaderboard mithil!

English

1.7K

Mithil Vakde@evilmathkid·1d

Reduce kolmogorov complexity in a ~turing machine defined by 8xH100s, PTX, a CPU and 10min while also optimising the hell out of the code execution Love it

OpenAI@OpenAI

Are you up for a challenge? openai.com/parameter-golf

English

10.1K

Mithil Vakde@evilmathkid·1d

@GregKamradt Yepp! Gonna gun for the world record

English

Greg Kamradt@GregKamradt·1d

@evilmathkid Are you going to get your submission on kaggle for the competition?

English

443

Mithil Vakde@evilmathkid·2d

I am sad the paper doesn't include my result :( I have one of the best non-LLM scores in the world today on public. (If not THE best)

Greg Kamradt@GregKamradt

Survey of ARC approaches over time Fascinating look - excited to read this

English

13.3K

Mithil Vakde@evilmathkid·1d

@tensorqt we are baiting god to give us a UV catastrophe

English

170

tensorqt@tensorqt·1d

kalomaze@kalomaze

i don't think you can really wishcast better underlying architectural primitives than any-to-any parallel communication over factorized sequences into existence, and from this point forward, it primarily looks like objective shaping changes rather than architectural ones

ZXX

Mithil Vakde@evilmathkid·2d

@ShashwatGoel7 Fair enough I'll post a paper once I get the world record or fail at the attempt

English

438

Shashwat Goel@ShashwatGoel7·2d

@evilmathkid Hi Mithil, this might just be a symptom of academia being slow to cite work (iirc your most recent thread was a week or so back?), and not citing tweets/blogs. I've heard releasing a pdf somewhere helps. Also getting it indexed on Google scholar for eg.

English

801

Mithil Vakde@evilmathkid·2d

@DarwinianVyas Nope, waiting for ARC-2 to reopen on kaggle next week x.com/evilmathkid/st…

Mithil Vakde@evilmathkid

The organisers replied btw -- they are swamped and lack bandwidth to verify. I think lots of people submitted models Kinda bummed but I totally understand. Will post my public eval results tomorrow

English

381

MumbaiPanda@DarwinianVyas·2d

@evilmathkid Have they evaluated you on their internals yet?

English

341

Mithil Vakde@evilmathkid·2d

It beats Kaiming's VisionARC when trained on the same dataset! x.com/evilmathkid/st…

Mithil Vakde@evilmathkid

44% on ARC-AGI-1 in 67 cents! Trained from scratch in 2hrs on a 5090 Matches TRM, beats HRM and is way faster & cheaper No recursion, just a transformer Also, 7% on ARC-2 🧵

English

1.3K

Mithil Vakde@evilmathkid·2d

@MattVMacfarlane Guess I wait till ARC-2 opens on kaggle end of this month. I'll try to get a big score. That'll force everyone to pay attention

English

592

Matthew Macfarlane@MattVMacfarlane·2d

@evilmathkid Classic, people only care about LLMs right now 🙄. Least you don't need to worry about people working on what you're working on!

English

669

Mithil Vakde@evilmathkid·2d

I only read a new ML paper/blog if it shows new capabilities or does what was prev considered impossible Others are mostly noise because there are so many difficult-to-control variables. Eg: Who knows what data went into the LLM? I also find it hard to trust any ablation/comparison results. Did they see the same effort as the main result? Even if you did the hparam sweeps, are you 100% sure there's no bug?

English

1.4K

Mithil Vakde@evilmathkid·3d

@spencerschiff_ yepp, its been inevitable since turing!

English

115

Spencer Schiff@spencerschiff_·3d

@evilmathkid Yeah, default assumption is that AI will ultimately have extremely good sample efficiency, so will be better than humans at never-before-seen situations

English

Mithil Vakde@evilmathkid·3d

100% automation for every task that is - verifiable, - repeated many times today - or for which data collection is easy What's left for humans: - Never before seen situations - tasks impossible to collect data for The latter will also get automated with ASI Many jobs today are a mix of both types of tasks

English

1.5K

Mithil Vakde@evilmathkid·3d

This makes ASI way more valuable than the weak AGI algorithm (pretraining, combined w RL posttraining) we have today

English

322

Mithil Vakde@evilmathkid·4d

@shxf0072 MAX_JOBS=4 for steel soul mode

English

178

Joey (e/λ)@shxf0072·4d

path of pain pip install flash-attn --no-build-isolation

English

111

8.1K

Mithil Vakde@evilmathkid·5d

@BlackHC @garrytan Look at the axes... [1, 26, 20?, 23, 4] 😭

English

112

Andreas Kirsch 🇺🇦@BlackHC·5d

@garrytan The bar charts in this slide are a crime against humanity tho

English

2.3K

Garry Tan@garrytan·6d

A Polish mathematician spent 20 years building a problem he said no AI could solve. GPT-5.4 cracked it on run 11. gli.st/xoxgkbvl

English

231

1.8K

202.6K

Mithil Vakde@evilmathkid·6d

Attributing it to RL posttraining seems like confirmation bias again even if its likely true. There are too many confounding factors in LLMs so I don't trust anyone who is extremely confident of their claims. I think bad science being done by literally everyone in AI (including alignment/safety). So all I'm gonna say is do more experiments yourself (and even then be more skeptical)

English

Eliezer Yudkowsky@allTheYud·6d

@evilmathkid @grok I'm not actually going to pretend to have sharply totally revised my beliefs based on this one interaction, after, like, eight years of accumulating stuff happening since the first transformer models.

English

146

Mithil Vakde@evilmathkid·6d

@allTheYud @grok a) making extremely sure claims first and then doing the experiments is bad b) your thread shows a lot of confirmation bias I don't even disagree with you. But this doesn't look good on your part

English

Eliezer Yudkowsky@allTheYud·6d

@grok Some QTers now dunking on my Grok Q&A because I performed my experiment in public, rather than prefiltering your data with a private query. Cool. You go on dunking and I'll go on doing real experiments where I don't always get the result I expect.

English

2.9K

Mithil Vakde@evilmathkid·6d

@allTheYud @grok @elder_plinius I don't understand your surprise. Did you not test the models before making claims?

English

156

Eliezer Yudkowsky@allTheYud·6d

@grok Uh @elder_plinius do you know what jailbreak would cause Grok to give me an unprompted answer without that jailbreak itself influencing Grok?

English

2.4K

Entdecken

@willdepue @GregKamradt @tensorqt @ShashwatGoel7 @DarwinianVyas @MattVMacfarlane @spencerschiff_ @elonmusk