Mithil Vakde

608 posts

Mithil Vakde banner
Mithil Vakde

Mithil Vakde

@evilmathkid

Sample efficiency research prev: Engineering physics @iitbombay '23

انضم Haziran 2023
307 يتبع3.3K المتابعون
تغريدة مثبتة
Mithil Vakde
Mithil Vakde@evilmathkid·
44% on ARC-AGI-1 in 67 cents! Trained from scratch in 2hrs on a 5090 Matches TRM, beats HRM and is way faster & cheaper No recursion, just a transformer Also, 7% on ARC-2 🧵
Mithil Vakde tweet media
English
29
72
683
53.7K
Greg Kamradt
Greg Kamradt@GregKamradt·
@evilmathkid Are you going to get your submission on kaggle for the competition?
English
1
0
1
443
Mithil Vakde
Mithil Vakde@evilmathkid·
@ShashwatGoel7 Fair enough I'll post a paper once I get the world record or fail at the attempt
English
1
0
3
438
Shashwat Goel
Shashwat Goel@ShashwatGoel7·
@evilmathkid Hi Mithil, this might just be a symptom of academia being slow to cite work (iirc your most recent thread was a week or so back?), and not citing tweets/blogs. I've heard releasing a pdf somewhere helps. Also getting it indexed on Google scholar for eg.
English
1
0
7
801
Mithil Vakde
Mithil Vakde@evilmathkid·
@MattVMacfarlane Guess I wait till ARC-2 opens on kaggle end of this month. I'll try to get a big score. That'll force everyone to pay attention
English
0
0
7
592
Matthew Macfarlane
Matthew Macfarlane@MattVMacfarlane·
@evilmathkid Classic, people only care about LLMs right now 🙄. Least you don't need to worry about people working on what you're working on!
English
1
0
6
669
Mithil Vakde
Mithil Vakde@evilmathkid·
I only read a new ML paper/blog if it shows new capabilities or does what was prev considered impossible Others are mostly noise because there are so many difficult-to-control variables. Eg: Who knows what data went into the LLM? I also find it hard to trust any ablation/comparison results. Did they see the same effort as the main result? Even if you did the hparam sweeps, are you 100% sure there's no bug?
English
4
0
14
1.4K
Spencer Schiff
Spencer Schiff@spencerschiff_·
@evilmathkid Yeah, default assumption is that AI will ultimately have extremely good sample efficiency, so will be better than humans at never-before-seen situations
English
1
0
1
89
Mithil Vakde
Mithil Vakde@evilmathkid·
100% automation for every task that is - verifiable, - repeated many times today - or for which data collection is easy What's left for humans: - Never before seen situations - tasks impossible to collect data for The latter will also get automated with ASI Many jobs today are a mix of both types of tasks
English
2
1
16
1.5K
Mithil Vakde
Mithil Vakde@evilmathkid·
This makes ASI way more valuable than the weak AGI algorithm (pretraining, combined w RL posttraining) we have today
English
0
0
2
321
Joey (e/λ)
Joey (e/λ)@shxf0072·
path of pain pip install flash-attn --no-build-isolation
English
15
6
111
8.1K
Garry Tan
Garry Tan@garrytan·
A Polish mathematician spent 20 years building a problem he said no AI could solve. GPT-5.4 cracked it on run 11. gli.st/xoxgkbvl
English
84
231
1.8K
202.6K
Mithil Vakde
Mithil Vakde@evilmathkid·
Attributing it to RL posttraining seems like confirmation bias again even if its likely true. There are too many confounding factors in LLMs so I don't trust anyone who is extremely confident of their claims. I think bad science being done by literally everyone in AI (including alignment/safety). So all I'm gonna say is do more experiments yourself (and even then be more skeptical)
English
0
0
0
42
Eliezer Yudkowsky
Eliezer Yudkowsky@allTheYud·
@evilmathkid @grok I'm not actually going to pretend to have sharply totally revised my beliefs based on this one interaction, after, like, eight years of accumulating stuff happening since the first transformer models.
English
1
0
3
146
Mithil Vakde
Mithil Vakde@evilmathkid·
@allTheYud @grok a) making extremely sure claims first and then doing the experiments is bad b) your thread shows a lot of confirmation bias I don't even disagree with you. But this doesn't look good on your part
English
1
0
3
80
Eliezer Yudkowsky
Eliezer Yudkowsky@allTheYud·
@grok Some QTers now dunking on my Grok Q&A because I performed my experiment in public, rather than prefiltering your data with a private query. Cool. You go on dunking and I'll go on doing real experiments where I don't always get the result I expect.
English
2
0
49
2.9K
Eliezer Yudkowsky
Eliezer Yudkowsky@allTheYud·
@grok Uh @elder_plinius do you know what jailbreak would cause Grok to give me an unprompted answer without that jailbreak itself influencing Grok?
English
18
0
27
2.4K