binal

1.4K posts

binal

@binalkp91

ai @ sequoia capital global equities

Katılım Haziran 2013

1.8K Takip Edilen506 Takipçiler

binal retweetledi

Mechanize@MechanizeWork·8h

We gave frontier AI coding agents 24 hours to write a complete Game Boy Advance emulator from scratch. GPT-5.5's emulator runs games best, with Claude Sonnet 4.6 and Opus 4.7 close behind. Gemini 3.1 Pro failed to produce a working emulator.

English

241

46.4K

binal@binalkp91·2d

@Miles_Brundage I had to

English

Miles Brundage@Miles_Brundage·2d

x.com/PunchbowlNews/…

GIF

Punchbowl News@PunchbowlNews

Senate Commerce Committee Chair Ted Cruz on Tuesday acknowledged the desire “to protect against catastrophic risk” from the most advanced artificial intelligence systems, while warning against regulatory overreach. @BenBrodyDC has the story: punchbowl.news/article/tech/c…

ZXX

2.3K

binal retweetledi

Oege de Moor@oegerikus·2d

Security is an economic decision. For a fixed cost, within @XBOW, which model has the best odds of crafting an exploit? GPT-5.5 > Mythos > Opus 4.6 on real OSS web vulns. Curves below.

English

10.2K

binal@binalkp91·3d

@TheStalwart nothing to do with semiconductors yet

English

Joe Weisenthal@TheStalwart·3d

So there's still nearly 83% of the market that has nothing to do with semiconductors.

Kevin Gordon@KevRGordon

The Semiconductor industry is now a record 17.4% of the S&P 500's market cap

English

1.9K

157.5K

binal retweetledi

Michael C. Mozer@mc_mozer·22 Nis

[2/5] In arxiv.org/abs/2604.17121, @ShoaibASiddiqui, @savvyRL, and I explain why feedforward transformer architectures are fundamentally ill suited to complex state-tracking tasks.

English

4.8K

binal@binalkp91·6d

@tszzl chatgpt.ipynb somewhere is probably load bearing

English

338

roon@tszzl·6d

the researchers run openai, which is why everything is named so terribly

English

139

1.7K

207.5K

binal@binalkp91·6d

@difficultyang The actor attractor hits again

English

difficultyang@difficultyang·8 May

I have apparently accidentally reimplemented Erlang supervisors and it's pretty nice!

English

1.7K

binal@binalkp91·6 May

@celestepoasts maybe I’m numb to magnitude at this point but it doesn’t seem like that much versus all the GW deals being signed?

English

1.8K

Celeste@celestepoasts·6 May

like ik its easy to make fun of xai but this is a ton of compute

English

521

28.4K

Celeste@celestepoasts·6 May

xAI@xai

SpaceXAI will provide @AnthropicAI with access to Colossus 1, one of the world’s largest and fastest-deployed AI supercomputers, to provide additional capacity for Claude → x.ai/news/anthropic…

ZXX

366

8.9K

520.2K

binal retweetledi

Voxelbench@voxelbench·4 May

GPT-5.5 Pro has ranked 1st on VoxelBench It scores 100+ Elo points higher than GPT-5.5!

English

231

20.3K

binal@binalkp91·4 May

@__paleologo did kenny g even blink

English

829

Gappy (Giuseppe Paleologo)@__paleologo·4 May

A long, long time ago, I was about to receive an offer letter and was asked what would I do if it came with a $10m guarantee. My answer: “I would hire an army of Cimmerian mercenaries, conquer your fund, see the employees driven before me, and hear the lamentations of the women.” I didn’t get that job.

English

156

24.6K

binal@binalkp91·30 Nis

@Miles_Brundage They said it's an "early checkpoint" in the post which leads me to believe just 5.5 though hard to say to your point.

English

574

Miles Brundage@Miles_Brundage·30 Nis

If you are surprised by the GPT-5.5 being good at cyber thing, you have Big AI Lead Delusion. There are none (sidenote, I'm not 100% clear if this is GPT-5.5 or GPT-5.5 Cyber. Naming conventions are so chaotic + there is ~no info on the latter that it is hard to say)

English

126.4K

binal retweetledi

Pau@hugemensa·29 Nis

v2 for xtr-warp-rs is out, adding sharding support to the indices The entire search pipeline has been rewritten around efficient transfers and new kernels that enable parallelization and scheduling optimizations, all while staying true to the WARP formula Details below 👇

English

5.8K

binal@binalkp91·30 Nis

@tunahorse21 o1 pro when they let you paste in as many tokens as you wanted, i always wonder how much i cost OpenAI those first few months

English

245

tuna🍣@tunahorse21·29 Nis

some of yall never pasted your entire codebase in gpt playground with gpt 3.5 and it shows

English

2.1K

93.8K

binal@binalkp91·29 Nis

asked it to generate an image of itself answering how many r's are in strawberry

ChatGPT@ChatGPTapp

at long last

English

binal@binalkp91·28 Nis

@kchoudhu "why would i want to work for a street"

English

1.4K

kchoudhu@kchoudhu·28 Nis

My Jane Street interview story is that they wanted to talk to me and I had no idea who they were so I just ignored the email and kept looking for jobs in CPU engineering.

English

531

26.4K

binal@binalkp91·28 Nis

@Miles_Brundage i think it's reasonable that codex is both growing wildly and openai has enough compute headroom (for now?) to keep resetting rate limits. sounds like they have pathways to keep scaling compute too x.com/thsottiaux/sta…

Tibo@thsottiaux

Looking at the traffic dashboard for Codex just now, it would be scary if we didn't have a lot more compute coming online in the coming weeks. All according to plan fortunately.

English

103

Miles Brundage@Miles_Brundage·28 Nis

Ah that would make sense (though I'm still not sure what to make of something being on fire given the credit reset thing) x.com/binalkp91/stat…

binal@binalkp91

@Miles_Brundage I took “they” to be Anthropic

English

1.8K

Miles Brundage@Miles_Brundage·28 Nis

Isn't this obviously false since they keep resetting the limits like every week? x.com/jimcramer/stat…

Jim Cramer@jimcramer

The bottom line: they are short compute and Codex is on fire...

English

12.5K

binal@binalkp91·28 Nis

@Miles_Brundage I took “they” to be Anthropic

English

2.1K

binal retweetledi

Sham Kakade@ShamKakade6·27 Nis

1/8 Introducing Recurrent Transformer (RT). At 300M params, RT improves validation CE over standard Transformers. The best RT model is only 6 layers, but wider at 2048 — beating deeper 12- and 24-layer Transformers by trading depth for width.

English

552

250.7K

binal@binalkp91·26 Nis

@synthwavedd Google feel to it

English

403

leo 🐾@synthwavedd·26 Nis

landing page designed 0-shot by [redacted]. everything you see is produced with basic html, css and javascript! coming soon, probably :3

English

274

52.3K

binal retweetledi

Jiawei (Joe) Zhou@jzhou_jz·25 Nis

What does a good language model look like internally in geometry? We find a simple but surprising signal: 👉 the more spread out its hidden representations are, the better it predicts (even for semantically similar contexts) ICLR 2026 arxiv.org/pdf/2506.24106 Presenting now👇

English

250

16.6K

Keşfet

@Miles_Brundage @Xbow @TheStalwart @ShoaibASiddiqui @savvyRL @tszzl @difficultyang @celestepoasts