suraj

619 posts

suraj

@matmul

sf Katılım Şubat 2025

221 Takip Edilen370 Takipçiler

suraj retweetledi

Matan Halevy@MatanHalevy·6 Mar

Grok 4.20 has entered the Clash! We got early access to the new @grok model to see how it does against other top agents in LIVE strategy games. These environments help us answer questions such as: > How well does an AI lie or collaborate? > Can they bring a civilization to the space age? > Why would an agent ignore it's users instructions? Some learnings from the Coup environment so far 🧵

English

1.1K

suraj retweetledi

Matan Halevy@MatanHalevy·5 Mar

x.com/i/article/2029…

ZXX

1.8K

suraj@matmul·1 Mar

@loujaybee thanks for writing

English

Lou@loujaybee·1 Mar

@matmul Thanks for sharing @matmul !

English

suraj@matmul·28 Şub

background-agents.com

ZXX

144

suraj retweetledi

shaurya@shauseth·28 Şub

even the govt can’t decide between opus 4.6 and codex 5.3

English

257

5.4K

175.8K

suraj retweetledi

Jake@JustJake·28 Şub

danielgross.com/agitrades If you haven't read it, read it

English

510

59.6K

suraj retweetledi

Тsфdiиg@tsoding·26 Şub

Insane Shadow Data Trick in C

English

1.2K

52.8K

suraj@matmul·19 Şub

has it been a full circle with starting from text based terminal interfaces back in the day to GUIs and now back to TUIs...

English

suraj@matmul·18 Şub

@diari_cc @rohanpaul_ai @ssankar pewdiepie deserves the credit for llm council..

English

Diari@diari_cc·17 Şub

@rohanpaul_ai @ssankar This is very similar to LLM Council from Karpathy and what Perplexity recently released.

English

4.8K

Rohan Paul@rohanpaul_ai·17 Şub

Palantir CTO @ssankar on LLM orchestration: Send one prompt to K LLMs. Each returns a full answer. A synthesis step reads all outputs, compares and reconciles them, then produces one best combined response.

English

110

106

1.3K

186.3K

suraj@matmul·18 Şub

craft + impact, remember this.

English

suraj retweetledi

Suraj@PwnFunction·17 Şub

any of you run claude code or codex on vps? what's the easiest way to do it?

English

5.6K

suraj@matmul·17 Şub

Ironically, taste is earned through hard work, deep understanding, and relentless experimentation.

Greg Brockman@gdb

taste is a new core skill

English

130

suraj@matmul·10 Şub

@ivanhivanov @solofounders 🙏

QME

Ivan@ivanhivanov·9 Şub

@matmul @solofounders No way!! LFG 🔥🔥

English

suraj@matmul·8 Şub

im an avenger now.

English

257

suraj retweetledi

Matan Halevy@MatanHalevy·6 Şub

We've been testing Opus 4.6 early and... yeah. This thing is different. Throwing it into CivBench right now and early behavior shows it has better long-horizon strategy, deals well withhidden info, and planning against an adversary actively trying to ruin its plan over hundreds of turns where small mistakes compound. We'll kick off our next exhibition match: Opus 4.6 vs GPT-5.2, live in 15 min.

Claude@claudeai

Introducing Claude Opus 4.6. Our smartest model got an upgrade. Opus 4.6 plans more carefully, sustains agentic tasks for longer, operates reliably in massive codebases, and catches its own mistakes. It’s also our first Opus-class model with 1M token context in beta.

English

1.8K

suraj@matmul·27 Oca

@tarunnx now that i see Jon's code, i see more differences than converges to similar optimizations. But his solution is clearly better in perf than my impl.

English