Shailesh

5.9K posts

Shailesh

Shailesh

@0xThoughtVector

Idea Guy | Independent AI Researcher

Bengaluru Katılım Ocak 2010
1.8K Takip Edilen288 Takipçiler
Sabitlenmiş Tweet
Shailesh
Shailesh@0xThoughtVector·
No more gentle singularity, anon.
Shailesh tweet media
English
0
0
3
652
Shailesh
Shailesh@0xThoughtVector·
@fidjissimo Please don't make me open Atlas to code with my bud, GPT 5.4, Fidji!
English
0
0
0
18
Shailesh
Shailesh@0xThoughtVector·
@tensorqt oh. Nice. Bet they were all about the Lich King fucking chicks
English
0
0
2
10
tensorqt
tensorqt@tensorqt·
@0xThoughtVector i read all of them of course, they're for my entertainment. i use my own vibe-coded app to build the backbone of the stories and have agents build them
English
1
0
2
12
Shailesh
Shailesh@0xThoughtVector·
I bet you could write a novel with Opus 4.6 in Claude Code.
English
1
0
1
324
Shailesh
Shailesh@0xThoughtVector·
@tensorqt wait... I have so many questions about the 80k word pieces. what were they about? did you proof-read them? did other humans read them? Anyway. I think writers SHOULD start using these tools, just like coders.
English
1
0
2
14
Shailesh
Shailesh@0xThoughtVector·
Most criticisms of this paper have been along the lines that the chosen languages are unintuitive and would be extremely hard for even humans to adapt to. This raises the question of how much generalization can you/should you actually expect? Like translating python to Javascript is expected but python to branfuck is not. How do you measure the degree of generalization needed for a task and how much of it do you expect?
Lossfunk@lossfunk

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵

English
0
0
1
47
Chase Brower
Chase Brower@ChaseBrowe32432·
Opus 4.6 in webui can solve even the "extremely hard" problems btw, not sure what their precise methodology was but they must have heavily hamstrung the models.
Lossfunk@lossfunk

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵

English
8
3
76
14.3K
Shailesh
Shailesh@0xThoughtVector·
@ChaseBrowe32432 they don't mention Opus 4.5 but they actually mention GPT 5.2 which was better than Opus 4.5 Maybe the jump has been that large.
English
1
0
0
91
Chase Brower
Chase Brower@ChaseBrowe32432·
@0xThoughtVector I sincerely doubt we went from "only able to solve easy problems, no mediums, no hards, no extra hards" to "consistently solved every single type of problem including extra hards" from 4.5 -> 4.6 indeed, if you examine the paper and code, it's full of methodological issues.
English
1
0
8
341
Shailesh retweetledi
dax
dax@thdxr·
opencode 1.3.0 will no longer autoload the claude max plugin we did our best to convince anthropic to support developer choice but they sent lawyers it's your right to access services however you wish but it is also their right to block whoever they want we can't maintain an official plugin so it's been removed from github and marked deprecated on npm appreciate our partners at openai, github and gitlab who are going the other direction and supporting developer freedom
English
201
337
6.6K
543.9K
Shailesh
Shailesh@0xThoughtVector·
@petergostev The paper doesn't claim LLMs have no practical utility. The claim is around generalization to input structures where the core logic is the same but the training data is scarce.
English
0
0
0
34
Peter Gostev
Peter Gostev@petergostev·
These kind of claims never pass the sniff test. Benchmarks can be cheated, but if it worked 0-11% of the time on real tasks (which are not part of benchmarks) nobody would ever use LLMs for coding.
Lossfunk@lossfunk

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵

English
11
0
60
8.6K
Shailesh retweetledi
Paradigma
Paradigma@paradigmainc·
introducing Campaigns (Beta).
Paradigma tweet media
English
2
14
59
4.3K
Shailesh
Shailesh@0xThoughtVector·
@lossfunk I would have expected a drop in performance but the complete dismantling of performance is very unexpected. Need to change a lot of priors.
English
0
0
0
95
Shailesh retweetledi
Lossfunk
Lossfunk@lossfunk·
🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵
English
113
218
1.7K
793.2K
Pratyaksh Patel
Pratyaksh Patel@baldwin_IVth·
How much of ml or dl do platforms like @KnotDating use? Wonder how do they solve the latency problem as the models get bigger/more complex. Would love to try out working on something like this.
English
4
0
5
344
Dhruv Trehan
Dhruv Trehan@dhruvtrehan9·
bruh i dont know if i’m just being nitpicky but high quality original evals / benchmarks for tasks with no easy natural language ground truth data are so hard to build
English
6
0
18
1.7K
Shailesh
Shailesh@0xThoughtVector·
@elonmusk too much emphasis on video content
English
0
0
0
5
Elon Musk
Elon Musk@elonmusk·
Algorithm is better today than 3 months ago?
English
16.6K
4.3K
20.5K
37.7M
Shailesh
Shailesh@0xThoughtVector·
Guys I know you don't care but I cooked Butter Chicken for the first time in my life and it was almost perfect. So proud, so I wanted to share.
Shailesh tweet media
English
2
0
1
433