binal

1.4K posts

binal

binal

@binalkp91

ai @ sequoia capital global equities

Katılım Haziran 2013
1.8K Takip Edilen506 Takipçiler
binal retweetledi
Mechanize
Mechanize@MechanizeWork·
We gave frontier AI coding agents 24 hours to write a complete Game Boy Advance emulator from scratch. GPT-5.5's emulator runs games best, with Claude Sonnet 4.6 and Opus 4.7 close behind. Gemini 3.1 Pro failed to produce a working emulator.
English
11
35
241
46.4K
binal retweetledi
Oege de Moor
Oege de Moor@oegerikus·
Security is an economic decision. For a fixed cost, within @XBOW, which model has the best odds of crafting an exploit? GPT-5.5 > Mythos > Opus 4.6 on real OSS web vulns. Curves below.
Oege de Moor tweet media
English
3
12
64
10.2K
binal
binal@binalkp91·
@tszzl chatgpt.ipynb somewhere is probably load bearing
English
0
0
4
338
roon
roon@tszzl·
the researchers run openai, which is why everything is named so terribly
English
139
27
1.7K
207.5K
difficultyang
difficultyang@difficultyang·
I have apparently accidentally reimplemented Erlang supervisors and it's pretty nice!
English
3
0
19
1.7K
binal
binal@binalkp91·
@celestepoasts maybe I’m numb to magnitude at this point but it doesn’t seem like that much versus all the GW deals being signed?
English
2
0
5
1.8K
Celeste
Celeste@celestepoasts·
like ik its easy to make fun of xai but this is a ton of compute
English
5
2
521
28.4K
binal retweetledi
Voxelbench
Voxelbench@voxelbench·
GPT-5.5 Pro has ranked 1st on VoxelBench It scores 100+ Elo points higher than GPT-5.5!
Voxelbench tweet media
English
6
13
231
20.3K
Gappy (Giuseppe Paleologo)
Gappy (Giuseppe Paleologo)@__paleologo·
A long, long time ago, I was about to receive an offer letter and was asked what would I do if it came with a $10m guarantee. My answer: “I would hire an army of Cimmerian mercenaries, conquer your fund, see the employees driven before me, and hear the lamentations of the women.” I didn’t get that job.
English
7
4
156
24.6K
binal
binal@binalkp91·
@Miles_Brundage They said it's an "early checkpoint" in the post which leads me to believe just 5.5 though hard to say to your point.
English
0
0
3
574
Miles Brundage
Miles Brundage@Miles_Brundage·
If you are surprised by the GPT-5.5 being good at cyber thing, you have Big AI Lead Delusion. There are none (sidenote, I'm not 100% clear if this is GPT-5.5 or GPT-5.5 Cyber. Naming conventions are so chaotic + there is ~no info on the latter that it is hard to say)
English
8
1
78
126.4K
binal retweetledi
Pau
Pau@hugemensa·
v2 for xtr-warp-rs is out, adding sharding support to the indices The entire search pipeline has been rewritten around efficient transfers and new kernels that enable parallelization and scheduling optimizations, all while staying true to the WARP formula Details below 👇
English
2
2
17
5.8K
binal
binal@binalkp91·
@tunahorse21 o1 pro when they let you paste in as many tokens as you wanted, i always wonder how much i cost OpenAI those first few months
English
0
0
2
245
tuna🍣
tuna🍣@tunahorse21·
some of yall never pasted your entire codebase in gpt playground with gpt 3.5 and it shows
English
50
62
2.1K
93.8K
binal
binal@binalkp91·
@kchoudhu "why would i want to work for a street"
English
0
0
10
1.4K
kchoudhu
kchoudhu@kchoudhu·
My Jane Street interview story is that they wanted to talk to me and I had no idea who they were so I just ignored the email and kept looking for jobs in CPU engineering.
English
6
4
531
26.4K
binal retweetledi
Sham Kakade
Sham Kakade@ShamKakade6·
1/8 Introducing Recurrent Transformer (RT). At 300M params, RT improves validation CE over standard Transformers. The best RT model is only 6 layers, but wider at 2048 — beating deeper 12- and 24-layer Transformers by trading depth for width.
Sham Kakade tweet media
English
17
68
552
250.7K
leo 🐾
leo 🐾@synthwavedd·
landing page designed 0-shot by [redacted]. everything you see is produced with basic html, css and javascript! coming soon, probably :3
leo 🐾 tweet media
English
55
6
274
52.3K
binal retweetledi
Jiawei (Joe) Zhou
Jiawei (Joe) Zhou@jzhou_jz·
What does a good language model look like internally in geometry? We find a simple but surprising signal: 👉 the more spread out its hidden representations are, the better it predicts (even for semantically similar contexts) ICLR 2026 arxiv.org/pdf/2506.24106 Presenting now👇
Jiawei (Joe) Zhou tweet mediaJiawei (Joe) Zhou tweet media
English
11
28
250
16.6K