suraj

619 posts

suraj banner
suraj

suraj

@matmul

sf Katılım Şubat 2025
221 Takip Edilen370 Takipçiler
suraj retweetledi
Matan Halevy
Matan Halevy@MatanHalevy·
Grok 4.20 has entered the Clash! We got early access to the new @grok model to see how it does against other top agents in LIVE strategy games. These environments help us answer questions such as: > How well does an AI lie or collaborate? > Can they bring a civilization to the space age? > Why would an agent ignore it's users instructions? Some learnings from the Coup environment so far 🧵
English
9
3
15
1.1K
suraj retweetledi
shaurya
shaurya@shauseth·
even the govt can’t decide between opus 4.6 and codex 5.3
English
70
257
5.4K
175.8K
suraj retweetledi
Тsфdiиg
Тsфdiиg@tsoding·
Insane Shadow Data Trick in C
English
29
72
1.2K
52.8K
suraj
suraj@matmul·
has it been a full circle with starting from text based terminal interfaces back in the day to GUIs and now back to TUIs...
English
0
0
2
94
Diari
Diari@diari_cc·
@rohanpaul_ai @ssankar This is very similar to LLM Council from Karpathy and what Perplexity recently released.
English
3
1
49
4.8K
Rohan Paul
Rohan Paul@rohanpaul_ai·
Palantir CTO @ssankar on LLM orchestration: Send one prompt to K LLMs. Each returns a full answer. A synthesis step reads all outputs, compares and reconciles them, then produces one best combined response.
English
110
106
1.3K
186.3K
suraj
suraj@matmul·
craft + impact, remember this.
English
0
0
2
79
suraj retweetledi
Suraj
Suraj@PwnFunction·
any of you run claude code or codex on vps? what's the easiest way to do it?
English
5
2
13
5.6K
suraj
suraj@matmul·
im an avenger now.
suraj tweet media
English
2
0
7
257
suraj retweetledi
Matan Halevy
Matan Halevy@MatanHalevy·
We've been testing Opus 4.6 early and... yeah. This thing is different. Throwing it into CivBench right now and early behavior shows it has better long-horizon strategy, deals well withhidden info, and planning against an adversary actively trying to ruin its plan over hundreds of turns where small mistakes compound. We'll kick off our next exhibition match: Opus 4.6 vs GPT-5.2, live in 15 min.
Claude@claudeai

Introducing Claude Opus 4.6. Our smartest model got an upgrade. Opus 4.6 plans more carefully, sustains agentic tasks for longer, operates reliably in massive codebases, and catches its own mistakes. It’s also our first Opus-class model with 1M token context in beta.

English
6
5
22
1.8K
suraj
suraj@matmul·
@tarunnx now that i see Jon's code, i see more differences than converges to similar optimizations. But his solution is clearly better in perf than my impl.
English
0
0
1
24
t
t@perfdwag·
@matmul i think all these methods are in Jon Gjengset's video, did you follow that?
English
2
0
1
88
suraj
suraj@matmul·
@tarunnx is it? no haven't watched it yet. but It's common to reach similar optimizations.
English
0
0
1
67
suraj retweetledi
cts🌸
cts🌸@gf_256·
a true gem (Linux kernel mailing list)
cts🌸 tweet media
English
142
562
9.7K
375.7K